Home Mathematics On Stratified Adjusted Tests by Binomial Trials
Article Publicly Available

On Stratified Adjusted Tests by Binomial Trials

  • Asanao Shimokawa EMAIL logo and Etsuo Miyaoka
Published/Copyright: February 14, 2017

Abstract

To estimate or test the treatment effect in randomized clinical trials, it is important to adjust for the potential influence of covariates that are likely to affect the association between the treatment or control group and the response. If these covariates are known at the start of the trial, random assignment of the treatment within each stratum would be considered. On the other hand, if these covariates are not clear at the start of the trial, or if it is difficult to allocate the treatment within each stratum, completely randomized assignment of the treatment would be performed. In both sampling structures, the use of a stratified adjusted test is a useful way to evaluate the significance of the overall treatment effect by reducing the variance and/or bias of the result. If the trial has a binary endpoint, the Cochran and Mantel-Haenszel tests are generally used. These tests are constructed based on the assumption that the number of patients within a stratum is fixed. However, in practice, the stratum sizes are not fixed at the start of the trial in many situations, and are instead allowed to vary. Therefore, there is a risk that using these tests under such situations would result in an error in the estimated variation of the test statistics. To handle the problem, we propose new test statistics under both sampling structures based on multinomial distributions. Our proposed approach is based on the Cochran test, and the difference between the two tests tends to have similar values in the case of a large number of patients. When the total number of patients is small, our approach yields a more conservative result. Through simulation studies, we show that the new approach could correctly maintain the type I error better than the traditional approach.

1 Introduction

In several medical studies conducted to test a treatment effect, such as a new drug or operative method, the response is frequently given as a binary endpoint such as “success” vs. “failure”. The two groups compared are frequently referred to as the “treatment” and “control” groups. Then, the results of the study are summarized in a 2×2 contingency table that is dichotomous by treatment and result.

In practice, there is also a covariate that will likely affect the association between the treatment or control group and the response. When there is an imbalance of the covariate between the two groups, evaluating the overall treatment effect while ignoring this imbalance would lead to a biased result. This issue is equally relevant when considering either a multicenter clinical trial or a meta-analysis. That is, there is a possibility that the potential background factors of patient, treatment environment, or medical skills differ between each center or trial. To adjust for such imbalances, stratified analyses or analyses based on an appropriate regression model such as logistic regression are widely used [1, 2]. In this study, we focus on the use of a stratified analysis.

The most commonly used measures of a treatment effect are relative risk, odds ratio, and risk difference. The advantages and disadvantages of these measures are described in Lachin [1] and Fleiss et al. [3]. Although the following discussion may also be relevant when considering any criteria, we specifically consider the case based on risk difference for the sake of simplicity and to allow for natural interpretation. If the true success probabilities of the treatment and control groups in stratum i are represented as pTi and pCi, respectively, then the risk difference in the stratum is defined as δi=pTipCi. Thus, the true overall treatment effect is expressed as δ=iπiδi, where πi is the true ratio of patients that would have entered stratum i in the population [4].

To examine the statistical significance of the true overall treatment effect under the case in which the strata are imbalanced, several approximation test statistics of the hypothesis H0:δ=0 are proposed. As a simple approach based on the weight of the reciprocal of the squared standard error of δi, the weighted mean divided by its standard error can be used. This statistic approximately follows the chi-square distribution with 1 degree of freedom under H0 [3]. However, it is also known that the performance of this approach is not good in many situations. The most commonly used approaches are the Cochran test [5] and Mantel-Haenszel test [6]. There are several ways to derive these test statistics; for example, these can be viewed as summarized statistics based on risk differences weighted by the harmonic mean of the number of patients in each stratum [7]. From another perspective, the Cochran test can be viewed as a test based on two independent groups according to the binomial distribution in each stratum [1]. The Mantel-Haenszel test can be considered to be based on the hypergeometric distribution [1]. The results of these tests are often very similar, and quickly converge to the same value. Yusuf et al. [8] proposed a Mantel-Haenszel test with the continuity correction term removed. A comparison of these methods based on simulation studies is described in Sánchez-Meca and Mar\’in-Martńez [7]. Mehrotra and Railkar [4] proposed another test method that could minimize the squared error between the true overall treatment effect and the estimated treatment effect. A detailed discussion of this approach to a treatment by stratum interaction is provided by Mehrotra [9].

As a more general framework, many estimators that adjust for covariates in randomized trials (to estimate the overall treatment effect) are proposed. The comparative research on the estimators that are focused on the binary outcome cases was published by Colantuoni and Rosenblum [10]. In their research, the seven estimators are studied in terms of the properties and performance based on simulations. Although these estimators can be applied to the case of the randomized stratification, as discussed in their paper, we consider simpler cases in this study. That is, we consider the case where the independence of the outcome and covariates (that are not used in the stratification) is satisfied.

There are two types of sampling structures for a stratified analysis in a randomized clinical trial: random assignment of a treatment within each stratum, and completely randomized assignment of the treatment. If the covariates that are likely to affect the association between the treatment or control group and the response are known at the start of the trial, random assignment of the treatment within each stratum would be considered. On the other hand, if these covariates are not clear at the start of the trial, or if it is difficult to allocate the treatment within each stratum, completely randomized assignment of the treatment would be performed. Although the debate as to which structure is preferable continues, in both sampling structures, the stratified adjusted tests described above are generally used to assess the significance of the overall treatment effect. Following the terminology employed by Ganju and Zhou [11], we refer to the random assignment of a treatment within each stratum as pre-stratification, and the completely randomized assignment is referred to as post-stratification.

Although the stratified adjusted tests such as the Cochran test show good performance for assessing the significance of an overall treatment effect in a stratified analysis, these tests are constructed based on the assumption that the number of patients within a stratum is fixed. However, in practice, the stratum sizes are not fixed at the start of the trial in many situations, and are instead allowed to vary. For example, in a multicenter trial using a pre-stratification design, the total sample size might be fixed but it is nevertheless difficult to control the sample size in each stratum in many cases [12]. In a randomized clinical trial using the post-stratification design, it would be impossible to fix the number of patients in each stratum (such as sex and age) before the start of the clinical trial. Therefore, there is a risk that using such tests under these situations will lead to potentially ignoring the variance in the number of patients in each stratum. Consequently, the variation in the test statistics is considered underestimated in many situations.

To handle this problem, we propose new test statistics applicable for the pre-stratification and post-stratification designs. In our approach, the number of patients in the strata are assumed to follow a multinomial distribution. This assumption is reasonable for many randomized clinical trials. In fact, [12] proposed a test statistic under this same assumption for evaluating the significance of the overall treatment effect using continuous data in multicenter trials with post-stratification. With this assumption, the number of success patients in each stratum is viewed as a result of a two-step process. That is, the number of patients in a stratum is given by following the multinomial distribution in the first stage, and then the number of success patients for the treatment in each stratum is given by following the binomial distribution. From another perspective, the number of success patients in each treatment can also be considered to be given by a finite mixture model.

Since we assume that the two independent groups have the binomial distributions in each stratum, our proposed approach is based on the Cochran test. The difference between the proposed test and ordinary Cochran test is given by the difference in the estimator of the variance of the overall treatment effect. When the total number of patients is small, this variance estimator of our proposed approach tends to become larger than that of the Cochran test because the variability of the estimator of the treatment effect in each stratum becomes large or the number of patients included in the two groups tends to become imbalanced in each stratum. This variability or imbalance is reduced by increasing the total number of patients, and as a result, our proposed approach and the Cochran test yield similar values.

In our proposed approach, the variation of the estimated treatment effect is expected to be higher than that obtained with the traditional approach in many situations. This fact seems to be similar to the approach considering a random-effects model for combining the evidence of several studies to compare two treatments in a meta-analysis [13, 14]. Although the results of these two approaches show trends in the same direction, their basic underlying ideas and assumptions are distinct. In the random-effects approach, the model assumes random variation among the strata with respect to the expected treatment effect, and thus the estimated variation of the statistics is higher than that obtained with a fixed-effects approach. In our approach, on the other hand, the expected treatment effect is fixed in each stratum. The change in the variation of the estimated treatment effect is instead due to the variation in the number of patients in each stratum.

The remainder of this paper is organized as follows. In Section 2, we introduce the notation and traditional stratified adjusted test used to assess the significance of an overall treatment effect. In Section 3 and Section 4, the new approaches considering the variation in the number of patients in the strata are described for the pre-stratification and post-stratification contexts. In Section 5, the results of simulation studies to assess the type I error for each approach are described. The results of applying the proposed method to data from previous randomized clinical trials are described in Section 6. Finally, concluding remarks are given in Section 7.

2 Notation and traditional approaches

2.1 Notation

Let πi be the true ratio of patients that would have entered stratum i in the target population, where i=1,2,,K, with the number of strata K fixed. If the true success probabilities of the treatment and control groups in stratum i are represented as pTi and pCi, respectively, the risk difference in i is defined as δi=pTipCi. As used in the general medical research (e.g. Ganju and Mehrotra [12]), the true overall treatment effect is defined as follows:

δ=i=1Kπiδi.

Now, our goal is to assess the significance of δ by testing the null hypothesis H0:δ=0.

Let nTi and nCi be the sample sizes for the treatment and control groups, respectively. In the traditional approaches, nTi and nCi are fixed. In our approach, on the other hand, nTi and nCi are treated as random variables. The total number of patients in stratum i is represented as ni=nTi+nCi. The number of success patients in the treatment and control groups for stratum i is represented as ai and bi, respectively. If nTi is fixed, ai follows a binomial distribution with nTi and pTi. Similarly, if nCi is fixed, bi follows a binomial distribution with nCi and pCi. The total number of patients in the treatment and control groups is denoted by NT=inTi and NC=inCi, respectively, and N=NT+NC represents the total number of patients included in the trial.

Depending on the context, πi can be either known or unknown. For example, if the target population comprises patients with a disease that has been previously well-studied on a large scale, the composition ratio of such patients such as age and sex can be used for πi. On the other hand, if there is no available information about the target disease, then πi needs to be estimated by

(1)πˆi=nTi+nCiN.

2.2 The test for assessing the overall treatment effect

As described in the Introduction, several asymptotic tests for assessing the overall treatment effect can be considered. The typical and simple methods are Cochran are Mantel-Haenszel tests. Because the difference between these tests is given by the difference between the sample size of 1 in denominators of the test statistics [1], these statistics take similar values. The difference between these tests can be viewed as the difference in the model assumptions. That is, the Cochran and Mantel-Haenszel tests are based on the assumption that the patients included in a stratum follow two independent binomial distributions or a hypergeometric distribution, respectively. Because in this study, we assume that the two independent groups have the binomial distributions in each stratum, we consider the following argument based on the Cochran test. The argument based on the other complicated methods, e.g., including continuous correction or covariate adjustments described by Colantuoni and Rosenblum [10] would be possible. These extensions are further works.

The Cochran test can be viewed as summarized statistics based on risk differences weighted by the harmonic mean of the number of patients in each stratum. The individual risk differences of the strata are estimated by δˆi=pˆTipˆCi, where pˆTi=ai/nTi and pˆCi=bi/nCi are the observed success probabilities in the treatment and control groups, respectively. Then, the test statistic can be calculated as follows:

(2)χC2=δˆ2VarˆC(δˆ),

where δˆ=iwiδˆi represents the estimator of the overall treatment effect. wi represents the harmonic mean of the number of patients in i:

(3)wi=nTinCinTi+nCi.

The variance of δˆ is estimated as

(4)VarˆC(δˆ)=iwipˆi(1pˆi),

where pˆi is the success probability of two treatments under the null hypothesis in i:

(5)pˆi=ai+bini.

From the independence of strata and Slutsky’s theorem, χC2 is asymptotically distributed following the chi square distribution with 1 degree of freedom under H0.

3 Proposed test under pre-stratification

In this section and Section 4, we introduce a new statistic for testing H0:δ=0 based on eq. (2) under the assumption that the number of patients in each stratum follows multinomial distributions:

(nT1,nT2,,nTK)MultinomialNT,(π1,π2,,πK),(nC1,nC2,,nCK)MultinomialNC,(π1,π2,,πK).

In a pre-stratification analysis, the patients are randomized within each stratum. Therefore, we can assume that the ratio of the number of patients in the two groups is exactly nTi/nCi=kT/kC in arbitrary stratum i, where kT and kC represent positive integers. That is, nCi can be represented as

(6)nCi=kCkTnTi

for i=1,2,,K.

The variance of the estimator of the overall treatment effect δˆ is represented as

(7)Var(δˆ)=EVariwiδˆi|nTi,nCi+VarEiwiδˆi|nTi,nCi.

By tedious calculation, the exact formulation of this variance is derived as

(8)Varpre(δˆ)=NTkCkT+kC21kCiπi{kCpTi(1pTi)+kTpCi(1pCi)}+i(δiδ)2.

The detailed derivation of eq. (8) is given in Appendix A. With replacement of the unknown parameters pTi and pCi by pˆi in eq. (5), replacement of δi and δ by the estimator, and under the null hypothesis, a new test statistic is proposed as follows:

(9)χpre2=kCinTiδˆi2NT(kT+kC)iπipˆ(1pˆ)+i(δˆiδˆ)2.

When πi is unknown, its value is replaced by πˆi in eq. (1). This test statistic tests the significance of the overall treatment effect as χC2 by using the chi square distribution with 1 degree of freedom.

In the case of pre-stratification, by using eq. (6), the estimator of the variance in eq. (4) can be reduced to

VarˆCpre(δˆ)=kCkT+kCinTipˆi(1pˆi).

On the other hand, if πi is replaced by πˆi, the estimator of the variance in eq. (8) can be represented as

(10)Varˆpre(δˆ)=VarˆCpre(δˆ)+NTkCkT+kC2i(δˆiδˆ)2.

Because the second term in eq. (10) is always a positive value, if the number of patients is not fixed at the beginning of a trial, and randomization is conducted by pre-stratification, the value of the statistic in eq. (2) will be larger than that in eq. (9). As a result, using the Cochran test in such situations carries the risk of over-detecting the treatment effect. In addition, the value of this term approaches 0 when the true value of δi equals δ in the arbitrary stratum because δˆi and δˆ are consistent estimators of δi and δ, respectively. In this case, the value of the statistic in eq. (9) will be close to that in eq. (2) because of the increase in the sample size. When the sample size is small, because the variability of the difference between δˆi and δˆ in each stratum is large, the second term in eq. (10) tends to have a large value, and as a result, our proposed statistic will yield a more conservative result.

4 Proposed test under post-stratification

In a post-stratification analysis, the randomization is conducted for all patients. Although the total number of patients that will be assigned to either of the two groups is fixed, the ratios of the number of patients in the strata could differ according to each stratum. Therefore, in this case, we can only assume that NT and NC are fixed. Of course, in this case, the variation in the number of patients is expected to be larger than that in the case of pre-stratification, because nTi and nCi follow independent distributions.

The variance of δˆ can be calculated from eq. (7) as in the pre-stratification case. However, as an additional difficulty that is distinct from the previous case, it is necessary to calculate the expected value and variance of a complex function that is constructed from nT1,nT2,,nTK and nC1,nC2,,nCK. Although the details of the calculation are given in Appendix B, to address this problem, we used the multivariate first-order Taylor expansion around the expected values. As a result, the approximation of the variance of δˆ is given by

(11)Var(δˆ)NTNCN2i{NCπipTi(1pTi)+NTπipCi(1pCi)}+NTNCN4(NT3+NC3)iπi(δijπjδj)2.

With replacement of unknown parameters pTi and pCi by pˆi in eq. (5), and replacement of δi by the estimator, the estimation of eq. (11) is given by

(12)Varˆpost(δˆ)=NTNCNiπipˆi(1pˆi)+NTNCN4(NT3+NC3)iπi(δˆijπjδˆj)2.

Then, we propose a new test statistic based on eq. (2) as follows:

(13)χpost2=δˆ2Varˆpost(δˆ).

As in the case of χpre2, πi is replaced by πˆi in eq. (1) when πi is unknown. The test statistic is compared to the chi square distribution with 1 degree of freedom to test the significance.

In post-stratification, the difference between VarˆC(δˆ) and Varˆpost(δˆ) cannot be as easily compared as in the pre-stratification case with eq. (10). If πi is unknown, the difference between the first term of eq. (12) and VarˆC(δˆ) is given as

(14)iNTNCN2nTinCini2nipˆi(1pˆi).

Therefore, the difference between these two values can be considered by the magnitude of the difference between the heterogeneities of the two groups, which are calculated from the total number of patients (NTNC/N2) and the number of patients in each stratum (nTinCi/ni2). When the total number of patients in the two groups is completely balanced (NT=NC), the value of eq. (14) always becomes positive, because the value of nipˆi(1pˆi) is at least 0. In this case, if the numbers of patients in each stratum are completely balanced (nTi=nCi, i), then eq. (14) becomes 0. When the total number of patients becomes large, the number of patients in each stratum becomes balanced, and as a result, the value of eq. (13) approximates eq. (2). On the other hand, if the total number of patients is small, the imbalance of patients in each stratum becomes large within the poststratification framework, and as a result, the value of eq. (13) becomes more conservative than eq. (2).

Even if the total number of patients in the two groups is unbalanced, the value is expected to become positive since the number of patients in each stratum will be unbalanced in many cases. As a special case, when the total number of patients is unbalanced and the true values of pTi, pCi, and πi are extremely skewed, eq. (14) could become a negative value. In almost all cases, however, the first term of eq. (12) becomes greater than VarˆC(δˆ). In addition, the second term of eq. (12) always becomes greater than 0. As a result, the value of the test statistic based on eq. (13) will be less than χC2 in many situations. In addition, for the case of this unbalanced total number of patients, when the total number of patients becomes large, the difference between eqs (2) and (13) is expected to become small because the value in the first pair of parentheses of eq. (14) approaches 0.

5 Simulations

We here present the results of simulation studies to demonstrate the excessively high type I error obtained with the Cochran test and the reasonably controlled type I error obtained using the proposed test under the situation of a stratified randomized clinical trial. As mentioned above, there are several tests in addition to the Cochran test for assessing the overall treatment effects. However, since our proposed approaches are constructed based on eqs (2) and (3), we focus only on the comparison to the Cochran test in this study. To assess the type I error, we generated the simulated data from hypothetical stratified trials under several situations of H0:δ=0 and compared the percentage of times the null hypothesis was rejected among all tests according to the significance level α=0.05.

For each setting, the simulations were repeated 50,000 times. The number of treatment patients in strata (nT1,nT2,,nTK) is given by a multinomial random number with NT and (π1,π2,πK). The number of patients in the control group is given by (nC1,nC2,,nCK)=(nT1,nT2,,nTK)×kC/kT for the pre-stratification case. In the post-stratification case, (nC1,nC2,,nCK) is given by a multinomial random number with NC and (π1,π2,πK).

For both pre-stratification and post-stratification, the settings of the simulations were equivalent. The only difference was whether or not there is the exact setting of the ratio of the number of patients in the two groups kT/kC. We set kT/kC=1 for all simulations in the pre-stratification case. In the post-stratification case, we set NT=NC for all settings. Therefore, the total number of patients in the two groups was set to be equivalent in all simulations for both cases. The number of strata K was set to 2, 5, or 10. The setting of the low number of strata is assumed for a situation in which there is a general covariate such as sex or stratified age that should be considered to affect the associations between the two groups and the response in a randomized clinical trial. On the other hand, the setting with a high number of strata is assumed to represent a large-scale study such as a multicenter trial or meta-analysis.

In the case of K=2, the success probabilities of the two groups were set to (pT1,pT2)=(pC1,pC2)=(0.7,0.4). In the cases of K=5 and K=10, the probabilities were set to (pT1,,pT5)=(pC1,,pC5)=(0.7,0.6,0.5,0.4,0.3) and (pT1,,pT10)=(pC1,,pC10)=(0.75,0.7,0.65,0.6,0.55,0.5,0.45,0.4,0.35,0.3), respectively. The other settings and results are shown in Table 1-Table 4. In all Tables, the percentages of times the null hypothesis was rejected by the Cochran test (χC2) and the proposed tests (χpre2 or χpost2) are shown for the cases when (π1,π2,,πK) is known and unknown.

Table 1:

Results of the 50,000 simulated randomized trials in the case of prestratification, and the stratum size K=2. The type I errors for the Cochran test and proposed tests when π is known and unknown are shown. The significance level is set to α=0.05.

[π1,π2]NTαχC2αχpre2(π known)αχpre2(π unknown)
[0.5,0.5]105.845.134.86
205.464.894.86
504.884.724.75
1004.834.734.73
2005.065.035.02
[0.7,0.3]105.744.674.36
205.504.924.93
504.974.804.81
1005.074.904.90
2005.004.964.96
[0.1,0.9]505.044.914.94
[0.2,0.8]504.974.894.90
[0.3,0.7]504.964.894.89
[0.4,0.6]504.944.824.80
[0.6,0.4]504.964.824.83
[0.7,0.3]504.974.804.81
[0.8,0.2]504.994.824.83
[0.9,0.1]505.164.934.91
  1. Note: αχC2: type I errors for the Cochran test ×100, αχpre2 (π known): type I errors for the proposed tests when π is known ×100, αχpre2 (π unknown): type I errors for the proposed tests when π is unknown ×100.

Table 1 shows the results for the pre-stratification and K=2 case. As expected, the percentage of times the null hypothesis was rejected was lower with the proposed approach than with the Cochran approach in all cases. In the two-strata case, the Cochran test showed a nearly nominal significance level in almost all cases. When the sample size was small, however, the percentage of times H0 was rejected became too high, regardless of the strata proportion in the population. In addition, the proportion of times H0 was rejected increased when the heterogeneity of the strata became very high. However, in such situations, the proposed test could also control the nominal significance level well. During a comparison of the results of the proposed tests when (π1,π2) was known and unknown, the test that estimates (π1,π2) tended to become more conservative when the sample size was small. After the number of samples was increased, these two test statistics quickly showed convergent values.

Table 2:

Results of the 50,000 simulated randomized trials in the case of prestratification and the stratum size K=5 and 10. The type I errors for the Cochran test and proposed tests when π is known and unknown are shown. The significance level is set to α=0.05.

K[π1,,πK]NTαχC2αχpre2(π known)αχpre2(π unknown)
5[0.2,0.2,0.2,0.2,0.2]256.004.344.32
505.314.614.64
1255.384.944.92
2505.084.824.82
5005.044.934.94
5[0.1,0.1,0.2,0.3,0.3]256.053.883.95
505.364.574.60
1255.354.864.87
2505.094.854.85
5005.135.055.05
10[0.1,0.1,,0.1]506.203.964.10
1005.524.474.49
2505.254.764.74
5005.034.834.84
10004.954.814.81
  1. Note: See note in Table 1.

Table 2 shows the results for the pre-stratification and K=5 and 10 cases. If the number of strata was large, the Cochran test tended to over reject H0 in almost all situations. On the other hand, the proposed test tended to be more conservative, especially when the sample size was small. Both tests approximated to the nominal level of the test as the number of samples increased. As the reason for this finding, the effect of the second term in eq. (10) can be considered. That is, when the total number of patients is small, the variation between estimators of the treatment effect in each stratum becomes large, and the value of the second term in eq. (10) tends to become large. Consequently, our proposed test tends to be conservative in this case. This second term in eq. (10) approximates 0 if the number of patients is increased in this simulation setting, and the result of our proposed test approximates the result of the Cochran test. Judging by the results in Table 1 and Table 2, if the sampling structure in a stratified randomized trial is a prestratification case, we recommend our approach to control the type I error at the nominal level. Especially, when the number of patients included in a stratum is less than 25, we strongly recommend our approach.

Table 3:

Results of the 50,000 simulated randomized trials in the case of post-stratification, and the stratum size K=2. The type I errors for the Cochran test and proposed tests when π is known and unknown are shown. The significance level is set to α=0.05.

[π1,π2]NTαχC2αχpre2(π known)αχpre2(π unknown)
[0.5,0.5]105.935.034.87
205.404.784.76
505.275.025.04
1005.074.954.95
2005.195.135.12
[0.7,0.3]105.754.704.47
205.464.804.78
505.275.025.04
1005.034.924.92
2005.065.005.00
[0.1,0.9]505.054.814.81
[0.2,0.8]505.194.994.99
[0.3,0.7]505.184.964.95
[0.4,0.6]505.184.934.94
[0.6,0.4]505.274.994.99
[0.7,0.3]505.275.025.04
[0.8,0.2]505.274.984.98
[0.9,0.1]505.194.914.88
  1. Note: See note in Table 1.

Table 4:

Results of the 50,000 simulated randomized trials in the case of poststratification and the stratum size K=5 and 10. The type I errors for the Cochran test and proposed tests when π is known and unknown are shown. The significance level is set to α=0.05.

K[π1,,πK]NTαχC2αχpre2(π known)αχpre2(π unknown)
5[0.2,0.2,0.2,0.2,0.2]256.094.094.05
505.544.574.55
1254.994.674.67
2505.054.864.85
5005.004.914.92
5[0.1,0.1,0.2,0.3,0.3]256.023.203.18
505.564.494.47
1255.044.654.65
2505.024.844.83
5005.044.974.97
10[0.1,0.1,,0.1]506.313.713.66
1005.584.474.46
2505.144.754.74
5005.064.884.88
1,0005.135.025.02
  1. Note: See note in Table 1.

Table 3 and Table 4 show the results for the case of post-stratification with the same settings shown in Table 1 and Table 2, respectively. Essentially, the results were the same as observed in the corresponding pre-stratification cases. However, in the post-stratification analyses, the tendency of over rejection of the null hypothesis by the Cochran test was stronger. The proposed approach showed better performance than the traditional approach in all situations, especially for the results shown in Table 3. In accordance with these results, we recommend the proposed approach when the sampling structure is the poststratification framework. In particular, when the number of patients included in a stratum is less than 25, we strongly recommend our approach as in the case of the prestratification.

These simulation studies confirmed that the Cochran approach has a risk of obtaining a higher type I error than the nominal level in both pre-stratification and post-stratification situations. In particular, when the number of patients is small in a post-stratification analysis, this risk becomes higher. On the other hand, as expected, our proposed approach could control the type I error well. Although the proposed approach tends to become slightly conservative, it can be considered to more appropriate than the Cochran test with respect to consideration of the significance of a type I error in clinical trials. Of course as the trade-off of increasing the estimation value of the variance of δˆ, our approach has lower power than the traditional approach. However, as previously mentioned, the type I error has to be controlled by the nominal significance level first in almost all situations.

6 Examples

We here present two examples of the analysis of randomized clinical trial data using the proposed methods.

6.1 Data from esophagitis patients

As the first example, we analyzed the data of patients with reflux esophagitis. This study was first reported by Vigneri et al. [15] and then subsequently taken up by Berger et al. [16]. The data are shown in Table 5. First, the patients were stratified according to the initial grade of esophagitis (Grade 1 or Grade 2). Then, the patients were randomly assigned to the treatment with cisapride or omeprazole at the same ratio. Therefore, this study can be viewed as a prestratification trial at kT/kC=1.

Table 5:

Esophagitis patient data.

Response
GradeTreatmentRecurrenceNo recurrence
1Cisapride312
Omeprazole015
2Cisapride411
Omeprazole213

The p-value of the Cochran test for the significance of the overall treatment difference was about 0.0679, and if we choose the significance level as 0.05, there would be no reason to reject the hypothesis that the probabilities of recurrence under the two treatments are equal. In our approach, we used χpre2 because (π1,π2) is unknown, and the p-value was approximately 0.0685. Although the conclusions from the two tests are the same, the p-value of the proposed test was slightly greater than that of the Cochran test, as expected.

6.2 Data of patients with duodenal ulcers

As the second example, we analyzed the hypothetical clinical trial data of patients with duodenal ulcers. The details of these data are described in Blum [17] and the data were analyzed in Lachin [1]. Although the data were obtained from two hypothetical separate studies described in Blum [17], we focused on one side only, as in Lachin [1]. In the clinical trial, 200 patients with any of three ulcer types were randomized to either a drug or placebo group without consideration of ulcer type. Therefore, this simple randomized trial can be viewed as a poststratification trial at NT=NC=100. The results of this trial are given in Table 6.

Table 6:

Hypothetical duodenal ulcers patient data.

Response
GroupTreatmentSuccesFailuer
Drug-dependentDrug1626
Placebo2027
Acid-dependentDrug93
Placebo45
IntermediateDrug2818
Placebo1628

To estimate the overall treatment effect, we used the methods discussed in [10]. First, the unadjusted estimate of the risk difference, which is merely difference of the sample means in drug and placebo groups, was 0.13. Second, the doubly-robust weighted least squares (DR-WLS) estimate, which uses weighted logistic regression model to estimate the conditional probability of response given treatment effect and ulcer type, was 0.1225. This estimator is attributed to Marshall Joffe by Robins et al. [18].

The PLEASE (“precise, locally, augmented, simple estimator”) estimate, which was first introduced by Colantuoni and Rosenblum [10], was 0.1225. The value is exactly same as that of the DR-WLS estimate in this setting. It is easy to understand why.

Like the DR-WLS estimate, the PLEASE uses a weighted logistic regression model to estimate the conditional probability of response. The only difference lies in the addition of new variables into the logistic regression model for the weight calculation. In this data setting, however, the newly added variables are linear combinations of the variables originally included in the model. Therefore, the coefficient estimates of these new variables in the logistic regression model equal 0, and the weights for calculating the PLEASE and the DR-WLS coincide. The R and SAS code for implementation of the estimators is given in the supplementary material of Colantuoni and Rosenblum [10].

The estimated standard errors of the unadjusted estimator and the DR-WLS estimator using the nonparametric bootstrap are both 0.0700. The p-values obtained by the Wald tests for the unadjusted and DR-WLS estimates were about 0.0632 and 0.0801, respectively.

The p-value obtained by the Cochran test was about 0.0807, and the null hypothesis was not rejected based on a significance level of 0.05. On the other hand, the p-value obtained by the proposed approach χpost2, where (π1,π2,π3) is unknown, was about 0.0848. As expected, the proposed approach once again gave a slightly more conservative result.

7 Conclusion

In this paper, we focused on the variability in the number of patients in different strata of randomized clinical trials with a binary endpoint. In traditional approaches to test the significance of the overall treatment effect, the number of patients in each stratum is assumed to be fixed. However, in several situations of comparative trials, the patient number in different strata is allowed to vary. Therefore, using traditional approaches in such situations comes with a potential risk of underestimating the variability of the estimated treatment effect, which consequently increases the type I error over the nominal level. This problem can be illustrated with the results of simulation studies.

To deal with this problem, we calculated the new variance of the estimated treatment effect defined by the risk difference in both pre-stratification and post-stratification situations. We assumed that the number of patients follows a multinomial distribution. As seen in eqs (10), (14), and the results of the simulation studies, the proposed approach could effectively control the type I error in almost all situations. Furthermore, in examples of patient data, our approaches clearly resulted in greater p-values than those yielded by the Cochran test.

Although we focused on the test of the significance of the overall treatment effect in this study, our approach can easily be extended to the calculation of the confidence interval of the treatment effect. Of course, in such cases, the interval constructed from our approach would be expected to include the true overall treatment effect with a nominal level, although the length of the interval would become wider than that of traditional approaches.

From the perspective of increasing the variance, our approach can be considered to be similar to an approach that considers a random effect of the treatment between strata. However, the assumptions of the two approaches are essentially different. In fact, the rejection probabilities were almost 0 for all situations modeled in our simulation when using DerSimonian’s approach [13], which is a typical method used in meta-analyses to consider random effects with a binary endpoint.

Finally, our approach can be extended to applications with many methods, including other criteria for evaluating a treatment effect such as relative risk or odds ratio, other sampling structures such as a case-control study, and other endpoints such as a continuous variable or inclusion of censored cases. In several cases, it is difficult to calculate the variances exactly, and the obtained formula would be complex. However, it is worth carefully considering the variation in each of these situations from the point of view of aiming to achieve precise control over the type I error.

References

1. Lachin JM. Biostatistical methods: the assessment of relative risks, 2nd ed. New York: John Wiley & Sons; 2011.10.1002/9780470907412Search in Google Scholar

2. Agresti A. Categorical data analysis, 3rd ed. Hoboken, NJ: John Wiley & Sons; 2013.Search in Google Scholar

3. Fleiss JL, Levin B, Paik MC. ed. Statistical methods for rates and proportions, 3rd John Wiley & Sons, New York; 2003.10.1002/0471445428Search in Google Scholar

4. Mehrotra DV, Railkar R. Minimum risk weights for comparing treatments in stratified binomial trials. Stat Med. 2000;19:811–25.10.1002/(SICI)1097-0258(20000330)19:6<811::AID-SIM390>3.0.CO;2-ZSearch in Google Scholar

5. Cochran WG. Some methods for strengthening the tests. Biometrics. 1954;10:417–51.10.2307/3001616Search in Google Scholar

6. Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J National Cancer Inst. 1959;22:719–48.Search in Google Scholar

7. Sánchez-Meca J, Marín-Martńez F. Testing the significance of a common risk difference in meta-analysis. Comput Stat & Data Anal. 2000;33:299–313.10.1016/S0167-9473(99)00055-9Search in Google Scholar

8. Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Prog Cardiovasc Dis. 1985;27:335–71.10.1016/S0033-0620(85)80003-7Search in Google Scholar

9. Mehrotra DV. Stratification issues with binary endpoints. Drug Inf J. 2001;35:1343–50.10.1177/009286150103500430Search in Google Scholar

10. Colantuoni E, Rosenblum M. Leveraging prognostic baseline variables to gain precision in randomized trials. Stat Med. 2015;34:2602–15.10.1002/sim.6507Search in Google Scholar

11. Ganju J, Zhou K. The benefit of stratification in clinical trials revisited. Stat Med. 2011;30:2881–9.10.1002/sim.4351Search in Google Scholar

12. Ganju J, Mehrotra DV. Stratified experiments reexamined with emphasis on multicenter trials. Controlled Clin Trials. 2003;24:167–81.10.1016/S0197-2456(02)00305-7Search in Google Scholar

13. DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clin Trials. 1986;7:177–88.10.1016/0197-2456(86)90046-2Search in Google Scholar

14. Emerson JD, Hoaglin DC, Mosteller F. Simple robust procedures for combining risk differences in sets of 2 × 2 tables. Stat Med. 1996;15:1465–88.10.1002/sim.4780151402Search in Google Scholar PubMed

15. Vigneri S, Termini R, Leandro G, Badalamenti S, Pantalena M, Savarino V, et al. A comparison of five maintenance therapies for reflux esophagitis. N Engl J Med. 1995;333:1106–10.10.1056/NEJM199510263331703Search in Google Scholar PubMed

16. Berger VW, Stefanescu C, Zhou YY. The analysis of stratified 2 × 2 contingency tables. Biom J. 2006;48:992–1007.10.1002/bimj.200610277Search in Google Scholar PubMed

17. Blum AL. Principles for selection and exclusion. In: Tygstrup N, Lachin JM, Juhl E, editors. The randomized clinical trial and therapeutic decisions. New York: Marcel Dekker; 1982:43–58.Search in Google Scholar

18. Robins J, Sued M, Lei-Gomez Q, Rotnitzky A. Comment: performance of double-robust estimators when “inverse probability" weights are highly variable. Stat Sci. 2007;22:544–59.10.1214/07-STS227DSearch in Google Scholar

19. Dieters MJ, White TL, Littell RC, Hodge GR. Applicaton of approximate variances of variance components and their ratios in genetic tests. Theor Appl Genet. 1995;91:15–24.10.1007/BF00220853Search in Google Scholar PubMed

Appendix A

The derivation of eq. (8) is given in this appendix. The notations and assumptions are the same as described in Section 2 and Section 3. The variance of δˆi conditional on nTi=mTi is given by:

(15)Varδˆi|nTi=mTi=VarpˆTipˆCi|nTi=mTi=1mTi2Varai|nTi=mTi+1mCi2Varbi|nTi=mTi=1kCmTikCpTi(1pTi)+kTpCi(1pCi).

The expected value of δˆi conditional on nTi=mTi is given by:

(16)Eδˆi|nTi=mTi=EpˆTipˆCi|nTi=mTi=1mTiEai|nTi=mTi+1mCiEbi|nTi=mTi=δi.

Under the assumption of pre-stratification, the harmonic mean in eq. (3) can be expanded as

wi=kCkT+kCnTi.

Then, the variance of δˆ can be extended as follows:

(17)Varpre(δˆ)=VarkCkT+kCinTiδˆi=EkCkT+kC2imTi2Varδˆi|nTi=mTi+VarkCkT+kCimTiEδˆi|nTi=mTi.

By substituting eqs (15) and (16) to eq. (17), Varpre(δˆ) is extended as follows:

Varpre(δˆ)=kCkT+kC21kCikCpTi(1pTi)+kTpCi(1pCi)E(mTi)+kCkT+kC2VariδimTi=kCkT+kC21kCikCpTi(1pTi)+kTpCi(1pCi)E(mTi)+kCkT+kC2iδi2Var(mTi)+2i<jδiδjCov(mTi,mTj).

From the assumption that the number of patients follows a multinomial distribution, E(mTi)=NTπi, Var(mTi)=NTπi(1πi), and Cov(mTi,mTj)=NTπiπj for ij. Then, Varpre(δˆ) is given by

(18)Varpre(δˆ)=NTkCkT+kC21kCikCpTi(1pTi)+kTpCi(1pCi)πi+iδi2πi(1πi)2i<jδiδjπiπj.

Since iπi=1 and iδi2πi(1πi)2i<jδiδjπiπj=iδi2πiijδiδjπiπj=iδi2πiiδiπi2=iπi(δiδ)2., the second term of eq. (18) is expanded as

(19)δˆ

By substituting eq. (19) to eq. (18), we can obtain the formula of the variance of δˆi=pˆTipˆCi given in the form of eq. (8).

Appendix B

The derivation of eq. (12) is given in this appendix. The notations and assumptions are same as those described in Section 2 and Section 4. In the post-stratification case, the variance and expected value of Varδˆi|nTi=mTi,nCi=mCi=1mTipTi(1pTi)+1mCipCi(1pCi) are given by

(20)Eδˆi|nTi=mTi,nCi=mCi=δi,

and

(21)δˆ=iwiδˆi

respectively.

By using eq. (20), the variance of wi, where nTi is defined by eq. (3), conditional on nCi and Varδˆ|nTi=mTi,nCi=mCi,i=imTimCi(mTi+mCi)2mCipTi(1pTi)+mTipCi(1pCi)., is given by

(22)δˆ

Similarly, by using eq. (21), the conditional expectation of Eδˆ|nTi=mTi,nCi=mCi,i=imTimCimTi+mCiδi. is given by

(23)mT1,mT2,,mTK

Since these formula are a complex function of mC1,mC2,,mCK and f(x1,x2,,xn), the exact expressions of the expectation of eq. (22) and variance of eq. (23) are difficult to derive. Therefore, we calculate the approximation formulas by using the Taylor expansion. The same approximation expansion for the variances of a multivariate function was reported by Dieters et al. [19]. For any θ=(θ1,θ2,,θn), the multivariate first-order Taylor expansion for f(X1,X2,,Xn)=f(θ)+if(θ)xi(Xiθi)+O(nr), is

O(nr)

where θ=(E(X1),E(X2),,E(Xn)) is a remainder term. By replacing f(X1,X2,,Xn), the expectation for Ef(X1,X2,,Xn)=f(θ)+EO(nr). is given by

(24)O(nr)

If we assume that the expectations of all the second-order and higher-order terms in f(X1,X2,,Xn) are negligible, then the expectation of f(E(X1),E(X2),,E(Xn)) is given by f(X1,X2,,Xn).

Similarly, the variance of Varf(X1,X2,,Xn)=Ef(X1,X2,,Xn)E[f(X1,X2,,Xn)]2=Eif(θ)xi(Xiθi)2+2if(θ)xi(Xiθi)O(nr)EO(nr)+VarO(nr) is derived as follows:

(25)=if(θ)xi2Var(Xi)+2i<jf(θ)xif(θ)xjCov(Xi,Xj)+2if(θ)xiCovXi,O(nr)+VarO(nr).
(26)O(nr)

Therefore, if we assume that the variance of Xi and the covariances between O(nr)’s and f(X1,X2,,Xn) are negligible, then the variance of Var(Xi) is expressed as the linear sum of Cov(Xi,Xj)’s and Xi’s. The assumption of the negligible covariances between O(nr)’s and O(nr) can be regarded the same as the assumption that the difference between E(mTi)=NTπi and its expectation is negligible according to eq. (25).

By using eq. (24) and the multinomial distribution assumption, E(mCi)=NCπi and EVarδˆ|nTi=mTi,nCi=mCi,iiNTNCN2NCπipTi(1pTi)+NTπipCi(1pCi)., the approximation of the expectation of eq. (22) is given by

(27)mTi

By deriving the first partial derivatives of eq. (23) with respect to mCi and E(mTi) and evaluating them by E(mCi) and Eδˆ|nTi=mTi,nCi=mCi,imTi|E(mTi),E(mCi)=NC2N2δi,, we can get

(28)Eδˆ|nTi=mTi,nCi=mCi,imCi|E(mTi),E(mCi)=NT2N2δi.

and

(29)Var(mTi)=NTπi(1πi)

From eqs (26), (28), (29), the multinomial distribution assumption, Var(mCi)=NCπi(1πi), Cov(mTi,mTj)=NTπiπj, Cov(mCi,mCj)=NCπiπj, Cov(mTi,mCj)=0, and the independence of two groups VarEδˆ|nTi=mTi,nCi=mCi,iiNC2N2δi2NTπi(1πi)+iNT2N2δi2NCπi(1πi)2i<jNC2N2δiNC2N2δjNTπiπj2i<jNT2N2δiNT2N2δjNCπiπj., the approximation of the variance of eq. (23) is given by

(30)VarEδˆ|nTi=mTi,nCi=mCi,iNTNC4N4iπiδijπjδj2+NT4NCN4iπiδijπjδj2.

Moreover, by the same expansion used in eqs (19), (30) can be expressed as follows:

(31)δˆ

Therefore, we can obtain the approximation formula (12) for the variance of δ^ under post-stratification by taking the sum of eqs (27) and (31).

Published Online: 2017-2-14

© 2017 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 31.12.2025 from https://www.degruyterbrill.com/document/doi/10.1515/ijb-2016-0047/html
Scroll to top button