Abstract
Tests for trend are important in a number of scientific fields when trends associated with binary variables are of interest. Implementing the standard Cochran–Armitage trend test requires an arbitrary choice of scores assigned to represent the grouping variable. Bartholomew proposed a test for qualitatively ordered samples using asymptotic critical values, but type I error control can be problematic in finite samples. To our knowledge, use of the exact probability distribution has not been explored, and we study its use in the present paper. Specifically we consider an approach based on conditioning on both sets of marginal totals and three unconditional approaches where only the marginal totals corresponding to the group sample sizes are treated as fixed. While slightly conservative, all four tests are guaranteed to have actual type I error rates below the nominal level. The unconditional tests are found to exhibit far less conservatism than the conditional test and thereby gain a power advantage.
1 Introduction
Suppose realizations of a binary variable are recorded in a k-group setting where the groups are defined by ordered categories. The data associated with such experiments can be organized as a
Data structure
| Response | Dose 1 | Dose 2 | Dose k | |
| 1 | ||||
| 0 | ||||
Consider an experiment with four dose groups, each fixed by the design to have 10 subjects. There is expected to be an increasing probability of response with increasing dose level, and doses administered are 0, 2.5, 25, and 250 units. In such scenarios, use of tests of the null in favor of the general alternative may result in an inefficient procedure in that they ignore the ordinal information which defines the groups. Rather, a test for the ordered alternative (a.k.a. “simple order hypothesis” [1]) is appropriate, and the alternative may be written as
Although the standard test due to Cochran and Armitage [2, 3] may be applied, the inherent weakness of this approach is the dependence on a choice of scores, which can be somewhat arbitrary and present the researcher with a dilemma in that the operating characteristics are dependent on the choice. Given that the procedure is based upon an assumed linear relationship between the chosen score and the probability of response, an incorrect choice in scores can result in losses in efficiency and vastly different conclusions for a given data set. In the aforementioned example, say responders totaled 0, 1, 4, and 3 in the four groups receiving the lowest to highest doses, respectively; see Table 2. Using dosage amount as the score, the Cochran–Armitage procedure does not detect any trend in response probabilities, with p-value 0.2615. However, if the transformation
Hypothetical data example
| Response | Dose amount | |||
| 0 | 2.5 | 25 | 250 | |
| 1 | 0 | 1 | 4 | 3 |
| 0 | 10 | 9 | 6 | 7 |
| 10 | 10 | 10 | 10 | |
Bartholomew [4] proposed a test utilizing isotonic regression that addressed trend alternatives via a likelihood-ratio approach, the test statistic being a variant of the standard Pearson chi-square statistic. In that the null distribution depends on an unknown nuisance parameter in finite samples, Bartholomew published tables of asymptotic critical values for his statistic in the balanced sample size case [5] for use in practice. Poon [6] conducted a Monte Carlo study indicating that the trend test based on these critical values sometimes violated the nominal type I error rate; Collings et al. [7] corroborated these simulation results. Leuraud and Benichou [8, 9] also used simulation to assess the statistical properties of Bartholomew’s test, finding the procedure to have good power properties under a number of alternatives.
Given the aforementioned type I error control problems, it is natural to alternatively consider use of the exact probability distribution. Complications arise in that, conditional on
For the
An approximate unconditional approach to eliminating the nuisance parameter is the notion of simply replacing it with an estimate, for instance one obtained via maximum likelihood [14]. This procedure provides an estimated (E) p-value. Lloyd [15] proposed using this E p-value as an ordering criterion and then maximizing the null likelihood, resulting in the E + M (Estimation + Maximization) p-value. Shan et al. [16] investigated this approach in the context of detecting monotone trend using the Cochran–Armitage test statistic and subsequently using the Baumgartner-Weiss-Schindler statistic [17].
Due to the conservative nature of conditional tests, and the violations of type I error that accompany tests based on asymptotic critical values as observed by Poon [6], it is desirable to develop exact unconditional tests for trend for use with qualitatively ordered samples using the Bartholomew statistic.
Section 2 gives background information on Bartholomew’s test for trend. Section 3 details the proposed exact testing procedures. Section 4 reports type I error and power properties of the examined procedures. Section 5 presents an example of the tests’ application, and Section 6 contains concluding remarks.
2 Background
Bartholomew’s [4] test of equality against the simple order alternative as applied to binomial counts is based on the statistic
where
If the usual MLEs
Although the statistic
even for moderate k this may require several numerical integrations.
[21]
Determining values for
Suppose the number of observed responders in the k groups of a particular data set is
where
Poon [6] generated tables of binomial counts from the conditional distribution and approximated the type I error and power of the exact test using the Bartholomew statistic under a variety of parameter settings, each with 500 simulations. Conditioning on the number of responses provides a p-value expression that is free of the nuisance parameter
3 Methods of unconditional exact testing
Three unconditional exact testing procedures are now described in detail and distinguished from the conditional version.
3.1 Maximization test
In the unconditional setting, the number of responses is not held fixed, and individual table probabilities depend on the nuisance parameter
This approach has been developed and applied in Suissa and Schuster [14].
3.2 Confidence interval test
An adjustment to the maximization approach was proposed by Berger and Boos [13] for the general testing problem involving a nuisance parameter and is referred to as the CI test. The approach is known elsewhere as partial maximization, e.g. Gunther et al. [25]. A confidence interval for the nuisance parameter is computed, and maximization is constrained to nuisance parameter values within this interval rather than the entire sample space (in this case
where
In the case of the general alternative, software applying this approach to
3.3 Estimation and maximization test
Lloyd’s E + M procedure [15, 28] for a general testing problem involving a nuisance parameter begins by computing an estimated (E) p-value. In this case, the nuisance parameter
where once again
Applications of the E + M approach are found in Lloyd [29, 30] and Shan et al. [16, 17].
4 Test properties
The exact methods detailed in Section 3 and the exact conditional approach from Section 2 are compared with the test using the asymptotic critical values presented in Bartholomew [5]. For additional comparison, the Cochran–Armitage test with equally spaced scores based on the asymptotic and exact conditional null distributions are included. Type I and type II error rates were calculated through complete enumeration, i.e. presented rates are exact and not obtained through Monte Carlo simulation. In the case of the conditional procedure, the unconditional error rates are presented for comparability which are the rates across possible values of s. Complications arise in computing
Figure 1 displays exact type I error plotted over the range of the nuisance parameter for the four exact tests and the test based on asymptotic critical values. For scenarios defined by the number of groups (3 or 4) and the sample size within each group (10 or 20), nuisance parameter values between 0 and 1 by an increment of 0.01 were used in implementation of the exact tests. An alternate approach is a two-stage procedure involving an initial pass to identify possible regions for the global maximum, followed by a more refined search of the identified regions. The critical values of

Exact type I error rates of the Bartholomew test based on the unconditional maximization (M), confidence interval (CI), expectation and maximization (E + M), conditional (
As can be seen in Figure 1, the unconditional tests preserve the nominal
Power is first summarized in Figure 2 by choosing a parameter setting similar to those listed in Neuhauser [31], of the form
To investigate power further, in Figure 3 we consider a parameter setting closer to the edge of the alternative space, choosing

Exact power for center of alternative space of the Bartholomew test based on the unconditional maximization (M), confidence interval (CI), expectation and maximization (E + M), conditional (

Exact power for edge of alternative space of the Bartholomew test based on the unconditional maximization (M), confidence interval (CI), expectation and maximization (E + M), conditional (
5 Example
We consider the hypothetical example given in Fleiss et al. [32], tabulating one-month release rates as a function of initial severity measured on an ordinal scale. The observed counts appear in Table 3, and testing results are found in Table 4. For the unconditional exact approaches, all
Example data from Fleiss et al.
| Released | Initial severity | |||
| Mild | Moderate | Serious | Severe | |
| Yes | 25 | 22 | 12 | 6 |
| No | 5 | 3 | 8 | 19 |
| 30 | 25 | 20 | 25 | |
Hypothesis test results for analysis of data in Fleiss et al.
| Test | p-Value |
| M | 0.00000149 |
| CI | 0.00100149 |
| E + M | 0.00000046 |
| 0.00000020 | |
| 0.00000072 | |
| 0.00000101 | |
| <0.005 |
Figure 4 depicts the p-value profiles of the M and E + M tests. As the height of these curves depends on

p-Value profiles for example in Fleiss et al.
6 Conclusion and recommendation
The test using the asymptotic critical values of Bartholomew does not have acceptable type I error control. The exact conditional test is guaranteed to control type I error, but is conservative in all instances, sometimes severely so. The maximization, confidence interval, and Estimation + Maximization tests all maintain type I error control and show markedly less conservatism than the conditional test. Furthermore, each unconditional test exhibits a modest gain in power over conditional tests.
Based on the type I error and power properties, no single unconditional test is uniformly preferable to the other two. The E + M test requires far more computational effort than the M and CI tests. The CI test involves a penalty term that can affect the p-value by orders of magnitude, in addition to the minor task of requiring calculation of a confidence interval for the nuisance parameter. By way of simplicity and computational convenience, we recommend Maximization as the unconditional method of choice.
References
1. Barlow RE, Bartholomew DJ, Bremner JM, Brunk HD. Statistical inference under order restrictions. New York: Wiley, 1972.Search in Google Scholar
2. Armitage P. Tests for linear trends in proportions and frequencies. Biometrics 1955;11:375–86.10.2307/3001775Search in Google Scholar
3. Cochran WG. Some methods for strengthening the common χ2 tests. Biometrics 1954;10:417–51.10.2307/3001616Search in Google Scholar
4. Bartholomew DJ. A test of homogeneity for ordered alternatives. Biometrika 1959;46:36–48.10.1093/biomet/46.1-2.36Search in Google Scholar
5. Bartholomew DJ. A test of homogeneity for ordered alternatives II. Biometrika 1959;46:328–35.10.1093/biomet/46.3-4.328Search in Google Scholar
6. Poon A. A Monte Carlo study of the power of some k-sample tests for ordered binomial alternatives. J Stat Comput Simulation 1980;11:251–9.10.1080/00949658008810412Search in Google Scholar
7. Collings BJ, Margolin BH, Oehlert GW. Analyses for binomial data, with application to the fluctuation tests for mutagenicity. Biometrics 1981;37:775–94.10.2307/2530159Search in Google Scholar
8. Leuraud K, Benichou J. A comparison of several methods to test for the existence of a monotonic dose-response relationship in clinical and epidemiological studies. Stat Med 2001;20:3335–51.10.1002/sim.959Search in Google Scholar
9. Leuraud K, Benichou J. Tests for monotonic trend from case-control data: Cochran–Armitage–Mentel trend test, isotonic regression and single and multiple contrast tests. Biom J 2004;46:731–49.10.1002/bimj.200210078Search in Google Scholar
10. Fisher RA. Statistical methods for research workers. Edinburgh: Oliver and Boyd, 1970.Search in Google Scholar
11. Agresti A, Coull BA. The analysis of contingency tables under inequality constraints. J Stat Plan Inference 2002;107:43–73.10.1016/S0378-3758(02)00243-4Search in Google Scholar
12. Barnard GA. A new test for 2×2 tables. Nature 1945;156:177.10.1038/156177a0Search in Google Scholar
13. Berger RL, Boos DD. Values maximized over a confidence set for the nuisance parameter. J Am Stat Assoc 1994;89:1012–16.10.1080/01621459.1994.10476836Search in Google Scholar
14. Suissa S, Schuster JJ. Exact unconditional sample sizes for the 2×2 binomial trial. J R Stat Soc Ser A Stat Soc 1985;148:317–27.10.2307/2981892Search in Google Scholar
15. Lloyd CJ. Exact p-values for discrete models obtained by estimation and maximization. Austr N Z J Stat 2008;50:329–45.10.1111/j.1467-842X.2008.00520.xSearch in Google Scholar
16. Shan G, Ma C, Hutson AD, Wilding GE. An efficient and exact approach for detecting trends with binary endpoints. Stat Med 2012;31:155–64.10.1002/sim.4411Search in Google Scholar PubMed
17. Shan G, Ma C, Hutson AD, Wilding GE. Some tests for detecting trends based on the modified Baumgartner-Weiss-Schindler statistics. Comput Stat Data Anal 2013;57:246–61.10.1016/j.csda.2012.04.021Search in Google Scholar PubMed PubMed Central
18. Robertson T, Wright F, Dykstra RL. Order restricted statistical inference. New York: Wiley, 1988.Search in Google Scholar
19. Ayer M, Brunk HD, Ewing GM, Reid WT, Silverman E. An empirical distribution function for sampling with incomplete information. Ann Math Stat 1955;26:641–7.10.1214/aoms/1177728423Search in Google Scholar
20. Silvapulle MJ, Sen PK. Constrained statistical inference: inequality, order, and shape restrictions. Hillsdale, NJ: Wiley, 2005.Search in Google Scholar
21. Robertson T, Wright FT. One-sided comparisons for treatments with a control. Can J Stat 1985;13:109–22.10.2307/3314873Search in Google Scholar
22. Agresti A, Coull BA. Order-restricted tests for stratified comparisons of binomial proportions. Biometrics 1996;52:1103–11.10.2307/2533072Search in Google Scholar
23. Mehta CR, Patel NR, Senchaudhuri P. Exact power and sample-size computations for the Cochran–Armitage trend test. Biometrics 1998;54:1615–21.10.2307/2533685Search in Google Scholar
24. Basu D. On the elimination of nuisance parameters. J Am Stat Assoc 1977;72:355–66.10.1007/978-1-4419-5825-9_26Search in Google Scholar
25. Gunther CC, Bakke O, Havard R, Langaas M. Trondheim (Norway): Norwegian university of science and technology; 2009:4. Statistical hypothesis testing for categorical data using enumeration in the presence of nuisance parameters.Search in Google Scholar
26. StatXact. Cytel Software Inc., 2007.Search in Google Scholar
27. Berger RL. Exact unconditional homogeneity/independence tests for 2 × 2 tables. 1996 [cited 2013 March 19]. Available at: https://urldefense.proofpoint.com/v1/url?u=http://www.stat.ncsu.edu/exact/.Search in Google Scholar
28. Lloyd CJ. Efficient and exact tests of the risk ratio in a correlated 2 × 2 table with structural zero. Comput Stat Data Anal 2007;51:3765–75.10.1016/j.csda.2006.12.035Search in Google Scholar
29. Lloyd CJ. A new exact and more powerful unconditional test of no treatment effect from binary matched pairs. Biometrics 2008;64:716–23.10.1111/j.1541-0420.2007.00936.xSearch in Google Scholar PubMed
30. Lloyd CJ. Exact tests based on pre-estimation and second order pivotals: non-inferiority trials. J Stat Comput Simul 2008;80:841–51.10.1080/00949650902806476Search in Google Scholar
31. Neuhauser M. And exact test for trend among binomial proportions based on a modified Baumgartner-Weiss-Schindler statistic. J Appl Stat 2006;33:79–88.10.1080/02664760500389756Search in Google Scholar
32. Fleiss JL, Levin B, Paik MC. Statistical methods for rates and proportions. Hillsdale, NJ: Wiley, 2003.10.1002/0471445428Search in Google Scholar
© 2014 by Walter de Gruyter Berlin / Boston
Articles in the same Issue
- Frontmatter
- Research Articles
- On the Correction of the Asymptotic Distribution of the Likelihood Ratio Statistic If Nuisance Parameters Are Estimated Based on an External Source
- Large Sample Bounds on the Survivor Average Causal Effect in the Presence of a Binary Covariate with Conditionally Ignorable Treatment Assignment
- Estimation of a Predictor’s Importance by Random Forests When There Is Missing Data: RISK Prediction in Liver Surgery using Laboratory Data
- Optimal Design Strategies for Sibling Studies with Binary Exposures
- Piecewise Cause-Specific Association Analyses of Multivariate Untied or Tied Competing Risks Data
- A Comparison of Exact Tests for Trend with Binary Endpoints Using Bartholomew’s Statistic
- Semiparametric Odds Rate Model for Modeling Short-Term and Long-Term Effects with Application to a Breast Cancer Genetic Study
- Classification in Postural Style Based on Stochastic Process Modeling
- Fuzzy Set Regression Method to Evaluate the Heterogeneity of Misclassifications in Disease Screening with Interval-Scaled Variables: Application to Osteoporosis (KCIS No. 26)
- Improving Dietary Exposure Models by Imputing Biomonitoring Data through ABC Methods
- Multiple Curve Comparisons with an Application to the Formation of the Dorsal Funiculus of Mutant Mice
Articles in the same Issue
- Frontmatter
- Research Articles
- On the Correction of the Asymptotic Distribution of the Likelihood Ratio Statistic If Nuisance Parameters Are Estimated Based on an External Source
- Large Sample Bounds on the Survivor Average Causal Effect in the Presence of a Binary Covariate with Conditionally Ignorable Treatment Assignment
- Estimation of a Predictor’s Importance by Random Forests When There Is Missing Data: RISK Prediction in Liver Surgery using Laboratory Data
- Optimal Design Strategies for Sibling Studies with Binary Exposures
- Piecewise Cause-Specific Association Analyses of Multivariate Untied or Tied Competing Risks Data
- A Comparison of Exact Tests for Trend with Binary Endpoints Using Bartholomew’s Statistic
- Semiparametric Odds Rate Model for Modeling Short-Term and Long-Term Effects with Application to a Breast Cancer Genetic Study
- Classification in Postural Style Based on Stochastic Process Modeling
- Fuzzy Set Regression Method to Evaluate the Heterogeneity of Misclassifications in Disease Screening with Interval-Scaled Variables: Application to Osteoporosis (KCIS No. 26)
- Improving Dietary Exposure Models by Imputing Biomonitoring Data through ABC Methods
- Multiple Curve Comparisons with an Application to the Formation of the Dorsal Funiculus of Mutant Mice