Testing Equality of Treatments under an Incomplete Block Crossover Design with Ordinal Responses

Kung-Jong Lui

doi:10.1515/ijb-2016-0069

Article Publicly Available

Testing Equality of Treatments under an Incomplete Block Crossover Design with Ordinal Responses

Kung-Jong Lui

Published/Copyright: February 3, 2017

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal The International Journal of Biostatistics Volume 13 Issue 1

Abstract

The generalized odds ratio (GOR) for paired sample is considered to measure the relative treatment effect on patient responses in ordinal data. Under a three-treatment two-period incomplete block crossover design, both asymptotic and exact procedures are developed for testing equality between treatments with ordinal responses. Monte Carlo simulation is employed to evaluate and compare the finite-sample performance of these test procedures. A discussion on advantages and disadvantages of the proposed test procedures based on the GOR versus those based on Wald’s tests under the normal random effects proportional odds model is provided. The data taken as a part of a crossover trial studying the effects of low and high doses of an analgesic versus a placebo for the relief of pain in primary dysmenorrhea over the first two periods are applied to illustrate the use of these test procedures.

Keywords: generalized odds ratio; incomplete block; crossover trial; ordinal data; Mantel-Haenszel test

1 Introduction

When studying non-curable chronic diseases, including angina pectoris, epilepsy, hypertension, asthma, etc., we may often consider using a crossover design to reduce the number of patients needed for a parallel group design [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]. When there are more than two treatments under comparison, the trial duration for a crossover design can be much longer than that for a parallel group design if each patient is to receive every treatment in use of the former. The longer the duration of a trial, the higher is the patient risk of being lost to follow-up. Furthermore, a lengthy trial duration can cause the difficulty in recruiting patients into a trial and ensuring patients to closely follow a study protocol. To alleviate these concerns, we may consider using an incomplete block crossover design, in which each patient is to receive only a subset of treatments [1]. For example, consider the double-blind placebo controlled crossover trial comparing 12μg and 24μg of formoterol solution aerosol with a placebo [1]. For practical reasons, it was decided that each patient could receive only two of the three treatments: the placebo, 12μg and 24μg of formoterol solution. Although there were some publications [1, 12, 15, 16, 17, 18] on the incomplete block crossover design, all these focused discussion on either continuous or binary data. The discussion on testing equality of treatments with ordinal responses under an incomplete block crossover design is limited [1, 7, 8].

Because ordinal responses are not on an interval scale, it is generally not appropriate to apply arithmetic operation to ordinal data [19]. In practice, we may commonly assign arbitrary scores to ordinal responses and do hypothesis testing with use of the t-test. Since the relative distances between consecutive categories in ordinal data are not truly comparable, converting ordinal responses into a universally agreeable score scale is difficult. Also, how to provide a meaningful and easily-understood summary measure based on these arbitrarily assigned scores to quantify the relative treatment effect can be challenging. On the other hand, if we dichotomize the ordinal responses into binary outcomes, the test procedures for binary data will probably lose efficiency.

In this paper, we propose use of the generalized odds ratio (GOR) for paired samples [20, 21, 22] to measure the relative treatment effect on patient responses in ordinal data. We focus our discussion on an incomplete block two-period crossover trial comparing three treatments with ordinal responses. We derive asymptotic test procedures based on the weighted-least-squares (WLS) and Mantel-Haenszel (MH) estimators [23] for testing equality between treatments. We further derive the exact test procedures for testing equality of treatments for small-sample cases. We employ Monte Carlo simulation to evaluate the finite-sample performance of these test procedures in a variety of situations. We use the data taken as a part of the crossover trial [24] comparing the low and high doses of an analgesic with a placebo for the relief of pain in primary dysmenorrhea over the first two periods to illustrate the use of these test procedures.

2 Notation, assumption and methods

Consider comparing two experimental treatments A and B with a placebo P under an incomplete block crossover design with two periods. We let X-Y denote the treatment-receipt sequence of receiving treatment X at period 1 and then crossover to receive treatment Y at period 2. Suppose that we randomly assign ng patients to group g(=1,2,3,4,5,6), where g=1 denotes the group with P-A treatment-receipt sequence; g=2 denotes the group with A-P treatment-receipt sequence; g=3 denotes the group with P-B treatment-receipt sequence; g=4 denotes the group with B-P treatment-receipt sequence; g=5 denotes the group with A-B treatment-receipt sequence; and g=6 denotes the group with B-A treatment-receipt sequence. For patient i(=1,2,…,ng) assigned to group g(=1,2,3,4,5,6), we let Yiz(g) denote the ordinal outcome of the patient at period z(=1,2), and take one of possible ordinal values Cj, where C1<C2<C3<⋯<CL. We let Xiz1(g) denote the indicator function of treatment-receipt for treatment A, and Xiz1(g)= 1 for patient i assigned to group g at period z receiving treatment A, and = 0, otherwise. Similarly, we let Xiz2(g) denote the indicator function of treatment-receipt for treatment B, and Xiz2(g)= 1 for the corresponding patient at period z receiving treatment B, and = 0, otherwise. We let 1iz(g) represent the indicator function of period, setting 1iz(g)= 1 for period z = 2, and = 0, otherwise. We further let μi(g)denote the random effect due to the ith subject in group g, and assume μi(g)’s to independently follow an unspecified probability density fg(μ). We assume that one can apply an adequate wash-out period on the basis of our subjective knowledge to nullify the carry-over effect. As noted elsewhere [1, 4, 11, 12, 25, 26, 27], if we cannot ensure this assumption to hold, we may not wish to employ the crossover design. For patient i(=1,2,…,ng) in group g(=1,2,3,4,5,6), we assume that the joint conditional probability of (Yi1(g),Yi2(g)) between periods 1 and 2, given the random effect μi(g)fixed, satisfies

(1)P(Yi1(g)<Yi2(g)|μi(g))=1/(1+exp(μi(g)+ηAPXi11(g)+ηBPXi12(g)+γ1i1(g)))×exp(μi(g)+ηAPXi21(g)+ηBPXi22(g)+γ1i2(g))/(1+exp(μi(g)+ηAPXi21(g)+ηBPXi22(g)+γ1i2(g))),P(Yi1(g)>Yi2(g)|μi(g))=1/(1+exp(μi(g)+ηAPXi21(g)+ηBPXi22(g)+γ1i2(g)))×exp(μi(g)+ηAPXi11(g)+ηBPXi12(g)+γ1i1(g))/(1+exp(μi(g)+ηAPXi11(g)+ηBPXi12(g)+γ1i1(g)))andP(Yi1(g)=Yi2(g)|μi(g))=1−P(Yi1(g)>Yi2(g)|μi(g))−P(Yi1(g)<Yi2(g)|μi(g)),

whereηAPand ηBPdenote the respective effect of treatments A and B relative to placebo P, as well as γ denotes the effect of period 2 versus period 1. Based on model (1), the GOR of responses [20, 21, 22] on a given patient i in group g when he/she has covariates (Xi21(g),Xi22(g),1i2(g)) at period 2 versus when he/she has covariates (Xi11(g),Xi12(g),1i1(g)) at period 1 is, by definition, equal to

(2)P(Yi1(g)<Yi2(g)|μi(g))/P(Yi1(g)>Yi2(g)|μi(g))=exp(ηAP(Xi21(g)−Xi11(g))+ηBP(Xi22(g)−Xi12(g))+γ(1i2(g)−1i1(g))).

When ηAP = 0, we can see from eq. (2) that the GOR of responses remains unchanged despite of receiving treatment A or placebo P. When ηAP > 0, taking treatment A tends to increase the patient response as compared with taking placebo P, given all the other covariates fixed. When ηAP < 0, taking treatment A tends to decrease the patient response as compared with taking placebo P. Similar interpretations of ηAP are applied to parameters ηBP and γ. We define the GOR of responses for treatment A versus placebo P and that for treatment B versus placebo P as GORAP=exp(ηAP) and GORBP=exp(ηBP), respectively. Also, we define the GOR of responses for treatment B versus treatment A as GORBA=exp(ηBP−ηAP).

On the basis of model (1), for a randomly selected patient i from group g the probability that the patient response Yi1(g)at period 1 is less than his/her response Yi2(g)at period 2 is

(3)P(Yi1(g)<Yi2(g))=∫1/(1+exp(μ+ηAPXi11(g)+ηBPXi12(g)+γ1i1(g)))×exp(μ+ηAPXi21(g)+ηBPXi22(g)+γ1i2(g))/(1+exp(μ+ηAPXi21(g)+ηBPXi22(g)+γ1i2(g)))fg(μ)dμ.

Similarly, for a randomly selected patient i from group g the probability that the patient response Yi1(g) at period 1 is larger than his/her response Yi2(g)at period 2 is

(4)P(Yi1(g)>Yi2(g))=∫1/(1+exp(μ+ηAPXi21(g)+ηBPXi22(g)+γ1i2(g)))×exp(μ+ηAPXi11(g)+ηBPXi12(g)+γ1i1(g))/(1+exp(μ+ηAPXi11(g)+ηBPXi12(g)+γ1i1(g)))fg(μ)dμ.

Note that because we do not assume any parametric p.d.f. for fg(μ) in the following discussion, our approach is semi-parametric.

For simplicity in notation, we define ΠC(g)=P(Yi1(g)<Yi2(g)) and ΠD(g)=P(Yi1(g)>Yi2(g)). From eqs (3) and (4), we can see that for a randomly selected patient i from group g the GOR of patient responses between periods 2 and 1 is

(5)GOR(g)=ΠC(g)/ΠD(g)=exp(ηAP(Xi21(g)−Xi11(g))+ηBP(Xi22(g)−Xi12(g))+γ(1i2(g)−1i1(g))).

We denote for a randomly selected patient i from group g the probability P(Yi1(g)=Cr,Yi2(g)=Cs) byπrs(g), where r=1,2,…,L, and s=1,2,…,L. Thus, we have

(6)ΠC(g)=P(Yi1(g)<Yi2(g))=∑r=1L−1∑s=r+1Lπrs(g),andΠD(g)=P(Yi1(g)>Yi2(g))=∑r=2L∑s=1r−1πrs(g).

These represent the probability that a randomly selected patient i from group g has the response at period 2 higher than his/her response at period 1, and the probability that a randomly selected patient has the response at period 1 higher than his/her response at period 2, respectively. When L = 2, GOR(g)reduces toπ12(g)/π21(g), the OR of responses between periods 2 and 1 in binary data with matched-pairs. On the basis of model (5), we can express the GOR of responses for treatment A versus placebo as (Appendix I)

(7)GORAP=exp(ηAP)=(GOR(1)/GOR(2))1/2=GOR(3)/GOR(5)=GOR(6)/GOR(4).

Let nrs(g)denote the number of patients in group g (= 1, 2, 3, 4, 5, 6) with the vector of patient responses (Yi1(g)=Cr,Yi2(g)=Cs)among ng patients. The random cell frequencies {nrs(g)|r = 1, 2, 3,⋯, L, s=1, 2, 3,⋯, L} then follow the multinomial distribution with parameters ng and {πrs(g)|r = 1, 2, 3,⋯, L, s=1, 2, 3,⋯, L}. Note that we can estimate πrs(g)by the unbiased consistent estimator πˆrs(g)=nrs(g)/ng. We define nC(g)=∑r=1L−1∑s=r+1Lnrs(g) and nD(g)=∑r=2L∑s=1r−1nrs(g). When substituting πˆrs(g) for πrs(g) in ΠC(g)and ΠD(g), we obtain ΠˆC(g)=nC(g)/ng and ΠˆD(g)=nD(g)/ng. These lead us to obtain the estimator GORˆ(g)= ΠˆC(g)/ΠˆD(g). Using the delta method [22, 28], we obtain the estimated asymptotic variance Varˆ(log(GORˆ(g)))=(ΠˆC(g)+ΠˆD(g))/(ngΠˆC(g)ΠˆD(g)). When substituting GORˆ(g)for GOR(g) in eq. (7), we obtain the following three consistent estimators for GORAP(=exp(ηAP)) as

(8)GOˆRAP=(GOˆR(1)/GOˆR(2))1/2=[(nC(1)nD(2))/(nD(1)nC(2))]1/2=GOˆR(3)/GOˆR(5)= (nC(3)nD(5))/(nD(3)nC(5))=GOˆR(6)/GOˆR(4)=(nC(6)nD(4))/(nD(6)nC(4)).

For convenience, we define three 2 × 2 tables consisting of cell frequencies (f11k,f12k,f21k,f22k)(for k = 1, 2, 3) corresponding to eq. (8) as

(f111=nC(1),f121=nC(2),f211=nD(1),f221=nD(2)),

(f112=nC(3),f122=nC(5),f212=nD(3),f222=nD(5)),

and

(9)(f113=nC(6),f123=nC(4),f213=nD(6),f223=nD(4)).

When testing H0:GORAP=1 versus Ha:GORAP≠1, we first consider use of the WLS summary test procedure [23] based on eq. (9). We will reject H0:GORAP=1at the α-level if

(10)(∑k=13WkLGORAP(k)/∑k=13Wk)2(∑k=13Wk)>Zα/22

where

W1=4/(1/f111+1/f121+1/f211+1/f221),

Wk=1/(1/f11k+1/f12k+1/f21k+1/f22k)

for k = 2, 3, LGORˆAP(1)=log((f111f221)/(f121f211))1/2, LGORˆAP(k)=log((f11kf22k)/(f12kf21k))for k = 2, 3, and Zα is the upper 100(α)th percentile of the standard normal distribution. Note that if fijk= 0 for some observed frequencies in a 2 × 2 table k, we cannot employ the test procedure (10). We may apply the commonly-used ad hoc arbitrary adjustment for sparse data by adding 0.50 to each observed frequency fijk in this particular table k.

When the observed marginal frequencies nC(g) and nD(g) are not large, the WLS test procedure may lose accuracy because the weights Wkin eq. (10) can be subject to a large variation. This may lead us to consider use of the MH summary test procedure [28, 29]. When comparing treatment A with placebo, we will reject the null hypothesis H0:GORAP=1 at the α-level if the test statistic

(11)(∑kf11k−∑kf1+kf+1k/f++k)2/{∑kf1+kf2+kf+1kf+2k/[f++k2(f++k−1)]}>Zα/22.

When both the numbers of patients nC(g) and nD(g) are small, the asymptotic WLS and MH test procedures may lose accuracy. Thus, we may consider use of the following exact test procedure. Define ndis(g)=nC(g)+nD(g). Given ndis(g) fixed, we can show that nC(g) follows the binomial distribution with parameters ndis(g) and ΠC(g)/(ΠC(g)+ΠD(g))(= GOR(g)/(1+GOR(g))). UnderGORAP=1, the conditional probability distribution of f11k, given f+1k,f+2k,f1+kand f2+k fixed, is given by the hypergeometric distribution:

(12)P(f11k|f+1k,f+2k,f1+k,f2+k)=f+1kf11kf+2kf1+k−f11kf++kf1+k

where ak≤f11k≤bk, ak=max0,f1+k−f+2kand bk=minf+1k,f1+k for k = 1, 2, 3.

Thus, the joint conditional probability distribution off111,f112andf113 is simply [30]

(13)P(f−11|f_+1,f_+2,f_1+,f_2+)=∏k=13(f+1kf11k)(f+2kf1+k−f11k)(f++kf1+k),

wheref_11=(f111,f112,f113)′, f_u+=(fu+1,fu+2,fu+3)′for u=1, 2, and f_+v=(f+v1,f+v2,f+v3)′ for v=1, 2. Given an observed value f_11o= (f111o,f112o,f113o)′, if the following p-value, calculated as [23, 30]

(14)∑f−11∈C∏k=13(f+1kf11k)(f+2kf1+k−f11k)(f++kf1+k)

where C = f_11P(f_11f_+1,f_+2,f_1+,f_2+)≤Pf_11o|f_+1,f_+2,f_1+,f_2+, is less than a small given α-level, we will reject H0:GORAP=1. Note that the exact test (14) is actually a direct extension of Fisher’s exact test to a series of 2 × 2 tables [23, 28, 30] with modifications to accommodate ordinal responses.

As shown in Appendix I, we can easily modify asymptotic test procedures (10) and (11) and the exact test procedure (14) to account for testingH0:GORBP=1 (or H0:GORBA=1) with replacing fijk by fijk∗ (or fijk∗∗), where fijk∗ and fijk∗∗ are defined in (26) and (28) (Appendix I), respectively.

3 Monte Carlo simulation

To evaluate and compare the performance of the WLS, MH and exact procedures for testing equality between treatments, we employ Monte Carlo simulation. By use of the conditional arguments, we do not need to estimate the nuisance period effect γ in use of these test procedures. We arbitrarily set γ equal to 0.10 in the simulation. Furthermore, our approach is valid for any assumed distribution for the random effects μi(g). We consider the cases in which random effects μi(g)are independent and identically distributed (i.i.d.) as the normal distribution with mean 0 and standard deviation σ=0.5,1; as well as μi(g)are i.i.d from a gamma distribution with shape parameter α=½ and scale parameter β=1,2. We cover the situations in which the relative effect of treatment A versus the placebo, ηAP=0.0,0.50; the relative effect of treatment B versus treatment A, ηBA=0.0,1.0 (such that the relative effect of treatment B versus the placebo, ηBP=ηAP+ηBA); and the number of patients n (=n1=n2=…=n6) per group n=10,15,25. Note that when n=10, these include the cases in which the expected number of patients with discordant responses ndis(g)(=nC(g)+nD(g)) is as small as approximate 3.71–4.84 patients. Note also that all test procedures proposed here depend on only the marginal totals nC(g) and nD(g)instead of individual cell frequency nrs(g). Thus, there is no need to consider the number of ordinal levels L, and so is nrs(g). For each configuration determined by a combination of the above parameter values, we write programs in SAS [31] and generate 10,000 simulated samples of n patients per group, each having the bivariate responses (Yi1(g), Yi2(g))^’ with probability P(Yi1(g)<Yi2(g)|μi(g)) and P(Yi1(g)>Yi2(g)|μi(g)) given by model (1), to calculate the simulated Type I error and power at the 0.05-level for a given test procedure. Recall that the power function of a test procedure with rejection region is, by definition, the probability that the sample points fall into the rejection region. The power function will give Type I error when H0 is true, and will give power when H0 is false. Therefore, the simulated Type I error for a given test procedure can be calculated as the proportion of 10,000 simulated samples for which we reject H0 when H0 is true. Similarly, the simulated power for a given test procedure can be calculated as the proportion of 10,000 simulated samples for which we reject H0 when H0 is false. For readers’ information, the SAS program for our simulation can be accessible at http://edoras.sdsu.edu/~kjl/exactso.htm.

4 Results

We summarize in Table 1 the estimated Type I error (in boldface) and power of using the WLS, MH and exact procedures for testing H0:ηAP=0 and testingH0:ηBP=0 at the 0.05-level when μi(g)are i.i.d. as the normal distribution with mean 0 and standard deviation σ=0.5,1. For example, when ηAP=0 and ηBP=1.0, the entries corresponding to procedures for testing H0:ηAP=0 are Type I errors, while those corresponding to procedures for testing H0:ηBP=0 are powers (Table 1). We can see that both the MH and exact tests can perform well, while the WLS test can be conservative especially when n is small (say, 10). We note that the MH test can be consistently of more power than the WLS and exact tests in almost all the situations considered in Table 1. For example, when σ=0.50, ηAP=0.50, ηBP=1.50 and n = 15, the powers for testing H0:ηAP=0 and H0:ηBP=0 are 0.175 and 0.825 for the MH test, while these powers are 0.124 and 0.781 for the WLS test, and are 0.132 and 0.731 for the exact test. We also note that the power for all test procedures increases as the number of subjects n increases, but decreases as the variation σ of responses between patients increases. Since all the findings on the performance of the WLS, MH and exact procedures with respect to Type I error, as well as the relative order of powers between these test procedures hold when μi(g)are i.i.d. from the gamma distribution, we do not present these results for brevity. These results are, however, to readers upon request.

Table 1:

The estimated Type I error (in boldface) and power of using the WLS, MH and Exact tests for testing H0:ηAP=0 and testing H0:ηBP=0 at the 0.05-level in situations in which the random effects μi(g)are i.i.d. as the normal distribution with mean 0 and standard deviation σ=0.5,1; the relative effect of treatment A versus the placebo, ηAP=0.0,0.50; and the relative effect of treatment B versus treatment A, ηBA=0.0,1.0 (such that the relative effect of treatment B versus the placebo, ηBP=ηAP+ηBA), and the number of patients n (=n1=n2=…=n6) per group n=10,15,25.

				Testing			Testing
				H0:ηAP=0			H0:ηBP=0
σ	ηAP	ηBP	n	WLS	MH	Exact	WLS	MH	Exact
0.5	0.00	0.00	10	0.020	0.050	0.046	0.022	0.051	0.048
			15	0.028	0.048	0.049	0.031	0.051	0.049
			25	0.037	0.048	0.049	0.034	0.046	0.050
		1.00	10	0.016	0.047	0.040	0.259	0.369	0.253
			15	0.029	0.051	0.051	0.458	0.520	0.401
			25	0.034	0.048	0.047	0.750	0.749	0.636
	0.50	0.50	10	0.073	0.129	0.089	0.073	0.131	0.090
			15	0.131	0.174	0.122	0.134	0.178	0.125
			25	0.251	0.269	0.195	0.255	0.271	0.199
		1.50	10	0.058	0.125	0.093	0.528	0.649	0.501
			15	0.124	0.175	0.132	0.781	0.825	0.731
			25	0.226	0.250	0.195	0.970	0.973	0.938
1.0	0.00	0.00	10	0.018	0.052	0.043	0.017	0.046	0.044
			15	0.025		0.048	0.030	0.053	0.053
			25	0.036	0.051	0.051	0.033	0.047	0.049
		1.00	10	0.013	0.050	0.037	0.208	0.330	0.222
			15	0.021	0.049	0.047	0.401	0.477	0.361
			25	0.035	0.052	0.047	0.681	0.697	0.583
	0.50	0.50	10	0.054	0.117	0.079	0.059	0.121	0.083
			15	0.114	0.159	0.112	0.115	0.160	0.109
			25	0.219	0.247	0.169	0.213	0.240	0.169
		1.50	10	0.043	0.112	0.074	0.457	0.597	0.447
			15	0.093	0.150	0.114	0.732	0.789	0.672
			25	0.198	0.230	0.174	0.947	0.951	0.904

5 An example

Consider the data (Table 2) taken as a part of a crossover trial comparing an analgesic at low (L) and high (H) doses with a placebo (P) for the relief of pain in primary dysmenorrhea patients over the first two-periods [24]. Here, we refer the low and high doses as treatments A and B. There were 86 patients randomly assigned to the six groups: P-L (g = 1); L-P (g = 2); P-H (g = 3); H-P (g = 4); L-H (g = 5); and H-L (g = 6). At the end of each treatment period, each patient was assessed the extent of relief on the ordinal scale: none (coded as 1), moderate (coded as 2) and complete (coded as 3). When applying the WLS and MH procedures, as well as the exact procedure to test H0:GORAP=1, we obtain p-values 0.0119, 0.0013 and 0.0002. Similarly, when applying these procedures with replacing fijk by fijk∗ to test H0:GORBP=1, we obtain all p-values as 0.0017, 0.0000 and 0.0000. Thus, there is evidence that taking either low or high dose of the analgesic can help the relief of pain in primary dysmenorrhea as compared with the placebo. Furthermore, when applying the corresponding test procedures with replacing fijk by fijk∗∗ to test H0:GORBA=1, we obtain p-values 0.7994, 0.9986, and 1.0000. There is no evidence that taking the high dose can help, as compared with the low dose, the relief of pain among these patients with primary dysmenorrhea.

Table 2:

The frequency of patients for the relief of pain (1: none or minimal; 2: moderate; 3: complete) in primary dysmenorrhea at the first two periods versus the groups determined by the treatment-receipt sequence (P: placebo; L: low dose; H: high dose).

	Group of Treatment-Receipt Sequence
g =	1	2	3	4	5	6
Responses	P-L	L-P	P-H	H-P	L-H	H-L
(1,1)	2	1	2	3	1	1
(1,2)	9	2	3	2	0	0
(1,3)	2	2	6	0	1	1
(2,1)	0	4	1	1	0	4
(2,2)	2	1	2	1	6	1
(2,3)	0	0	2	1	1	2
(3,1)	0	5	0	6	1	4
(3,2)	0	0	0	0	2	0
(3,3)	0	0	0	0	0	1
ng=	15	15	16	14	12	14

Note that if we assume the proportional odds model using cumulative logit [28, 31] with normal random effects [19] due to patients, we can apply Proc GLIMMIX [31] to analyze the data in Table 2. When employing this SAS procedure (http://edoras.sdsu.edu/~kjl/exaordex1.htm), we obtain the parameter estimates (and their estimated standard error (SD)) of the relative effect for the low dose versus the placebo, the high dose versus the placebo, and the high dose versus the low dose are: –1.7656 (SD = 0.3898), –2.3404 (SD = 0.4011), and –0.5748 (SD = 0.3537). On the basis of these estimates (of which the signs are all < 0), we may conclude that both the low dose and high dose can significantly improve, as compared with the placebo, the relief of pain at the 5 % level; both p-values are < 0.0001. Furthermore, though the high dose can improve the outcome of patients as compared with the low dose, this improvement is not significant at the 5 %-level. All the above results are essentially similar to those reported previously for the complete block crossover design over a three-period trial [32].

6 Discussion

We do not recommend, as noted previously [1, 4, 11, 12, 25, 26, 27], use of the crossover design if we cannot ensure ourselves to nullify the carry-over effects with an adequate wash-out period. On the other hand, if there are carry-over effects due to earlier treatments, we note that the test procedures proposed here can still be valid for use under the simple carry-over model (Appendix II). Also, we note that although one may apply the estimator as given in Appendix II for the difference in carry-over effects to test whether there are differential carry-over effects, we do not recommend using this test to determine whether the assumption of no carry-over effects holds. This is because the concerns raised by Freeman [33] and Senn [34] for using the two-stage test procedure suggested by Grizzle [2].

When employing the test procedures developed here, we do not need to assume any parametric distribution for the random effects due to patients. Thus, our procedures is, as noted before, semi-parametric. Also, the number of patients for a crossover trial is often small. The exact test procedure (14) can be of use in practice. By contrast, one needs to assume the random effects due to patients independently follow the normal distribution in use of Proc GLIMMIX [35]. Also, note that the proportional odds model can be badly violated by many bivariate distributions [20, 36, 37]. Furthermore, note that Wald’s test can be invalid for use if the number of patients in a trial is small and the data are sparse.

We note that when the number of subjects per group n is small (say, 10), use of the WLS procedure can be conservative, while the MH and exact test procedure can perform well (Table 1). We further note that the MH test procedure is generally of more power than the other two procedures in the situations considered here. Because use of the MH procedure does not involve any sophisticated numerical procedure and can be calculated by a hand calculator, we may recommend the MH test procedure for general use when n is not large. We may use of the exact test procedure if one has the concern of normal approximation for a small n. When the number of subjects n per group is large (say, 40), however, we want to note that the WLS procedure can be of more power than the MH and exact procedures on the basis of Monte Carlo simulation (not shown here).

If we wish to study the relative period effect, we may apply similar ideas as above to derive the corresponding procedures for testing H0:γ=0. For example, we can easily see from eqs (16)–(21) (in Appendix I) that the GOR of responses for period 2 versus period 1 is

(15)GOR21=exp(γ)=(GOR(1)GOR(2))1/2=(GOR(3)GOR(4))1/2=(GOR(5)GOR(6))1/2.

Following similar arguments as for comparing the treatment effect, we can derive on the basis of (15) the WLS, MH and exact procedures for testingH0:GOR21=1. Finally, note that when the patient response is dichotomous, the GOR reduces to the common OR for paired sample data. Thus, the MH and exact test procedures include those for testing equality of treatments in binary data [18] under the random effects logistic regression model as special cases.

In summary, we have derived the WLS, MH and exact procedures for testing equality between treatments under an incomplete block crossover design with ordinal responses. We have evaluated and compare their performance in a variety of situations based on Monte Carlo simulation. We have noted that the proposed test procedures are valid for use in the presence of simple carry-over effects. We have compared the proposed test procedures with use of Wald’s test procedures assuming the normal random effects proportional odds model. We have noted that the proposed test procedures include those for testing equality of treatments in binary data as special cases. The results, findings and discussions should have use for biostatisticians and clinicians when they employ a two-period crossover design to compare three treatments in ordinal data.

Acknowledgements

The author wishes to thank the reviewer for valuable and useful comments and suggestions to improve the contents and clarity of this article.

Appendix I

On the basis of the assumed model GOR(g)(5), we may obtain the GOR of responses between periods 2 and 1 in group g (= 1, 2, 3, 4, 5, 6) as follows:

(16)GOR(1)=exp(ηAP+γ),

(17)GOR(2)=exp(−ηAP+γ),

(18)GOR(3)=exp(ηBP+γ),

(19)GOR(4)=exp(−ηBP+γ),

(20)GOR(5)=exp(ηBP−ηAP+γ),

(21)GOR(6)=exp(−ηBP+ηAP+γ).

On the basis of eqs (16) and (17), we have

(22)GORAP=exp(ηAP)=(GOR(1)/GOR(2))1/2,

which is free from the period effect. Similarly, we can see from eqs (18) and (20) that

(23)GORAP=GOR(3)/GOR(5).

Furthermore, we can see from eqs (21) and (19) that

(24)GORAP=GOR(6)/GOR(4).

Following similar arguments as for deriving eqs (22)–(24), we obtain the following three consistent estimators for the GOR of responses between treatment B and placebo P as

(25)GOˆRBP=(GOˆR(3)/GOˆR(4))1/2=GOˆR(1)/GOˆR(6)=GOˆR(5)/GOˆR(2).

Again, for convenience in the following discussion we define three 2 × 2 tables consisting of frequencies (f11k∗,f12k∗,f21k∗,f22k∗)(for k = 1, 2, 3) corresponding to equation (25) as

(f111∗=nC(3),f121∗=nC(4),f211∗=nD(3),f221∗=nD(4)),(f112∗=nC(1),f122∗=nC(6),f212∗=nD(1),f222∗=nD(6)),

and

(26)(f113∗=nC(5),f123∗=nC(2),f213∗=nD(5),f223∗=nD(2)).

Again, using the same arguments as above, we may further obtain the following three consistent estimators for GORBA(=exp(ηBP−ηAP)) as

(27)GOˆRBA=(GOˆR(5)/GOˆR(6))1/2=GOˆR(3)/GOˆR(1)=GOˆR(2)/GOˆR(4).

Also, we define the following three 2 × 2 tables consisting of frequencies (f11k∗∗,f12k∗∗,f21k∗∗,f22k∗∗)(for k = 1, 2, 3) corresponding to equation (27) as

(f111∗∗=nC(5),f121∗∗=nC(6),f211∗∗=nD(5),f221∗∗=nD(6)),(f112∗∗=nC(3),f122∗∗=nC(1),f212∗∗=nD(3),f222∗∗=nD(1)),

and

(28)(f113∗∗=nC(2),f123∗∗=nC(4),f213∗∗=nD(2),f223∗∗=nD(4)).

Appendix II

Using the simple carryover model, we assume that

P(Yi1(g)<Yi2(g)|μi(g))=1/(1+exp(μi(g)+ηAPXi11(g)+ηBPXi12(g)+γ1i1(g)))

×[exp(μi(g)+ηAPXi21(g)+ηBPXi22(g)+γ1i2(g)+ρP1i(g=1,3)+ρA1i(g=2,5)+ρB1i(g=4,6))

/(1+exp(μi(g)+ηAPXi21(g)+ηBPXi22(g)+γ1i2(g)+ρP1i(g=1,3)+ρA1i(g=2,5)+ρB1i(g=4,6))],

P(Yi1(g)>Yi2(g)|μi(g))=

[1/(1+exp(μi(g)+ηAPXi21(g)+ηBPXi22(g)+γ1i2(g)+ρP1i(g=1,3)+ρA1i(g=2,5)+ρB1i(g=4,6)))]

×exp(μi(g)+ηAPXi11(g)+ηBPXi12(g)+γ1i1(g))/(1+exp(μi(g)+ηAPXi11(g)+ηBPXi12(g)+γ1i1(g)))

and

(29)P(Yi1(g)=Yi2(g)μi(g))=1−P(Yi1(g)>Yi2(g)|μi(g))−P(Yi1(g)<Yi2(g)μi(g)),

where 1i(g=g1,g2)=1 for g=g1,g2 at period z = 2, and = 0, otherwise; as well as ρP, ρA, and ρB represent the carry-over effect due to placebo, treatment A and treatment B, respectively.

On the basis of the assumed model (29), we obtain

P(Yi1(g)<Yi2(g)|μi(g))/P(Yi1(g)>Yi2(g)|μi(g))=

(30)exp(ηAP(Xi21(g)−Xi11(g))+ηBP(Xi22(g)−Xi12(g))+γ(1i2(g)−1i1(g))+ρP1i(g=1,3)+ρA1i(g=2,5)+ρB1i(g=4,6)).

On the basis of model (30), we may obtain the GOR of responses between periods 2 and 1 in group g (= 1, 2, 3, 4, 5, 6) as

(31)GOR(1)=exp(ηAP+γ+ρP),

(32)GOR(2)=exp(−ηAP+γ+ρA),

(33)GOR(3)=exp(ηBP+γ+ρP),

(34)GOR(4)=exp(−ηBP+γ+ρB),

(35)GOR(5)=exp(ηBP−ηAP+γ+ρA),

(36)GOR(6)=exp(−ηBP+ηAP+γ+ρB).

From eqs (31) and (32), we have

(37)GOR(1)/GOR(2)=exp(2ηAP+ρP−ρA).

Similarly, from eqs (33)–(36), we obtain

(38)GOR(3)/GOR(5)=exp(ηAP+ρP−ρA),

and

(39)GOR(6)/GOR(4)=exp(ηAP).

Under H0:GORAP=1 (i. e., there is no difference in effects between treatment A and placebo), we can reasonably assume that the carry over effect ρA, due to treatment A and the carry-over effectρP due to placebo are equal. Thus, all procedures (10, 11) and (14) can still preserve Type I error. We can use the same arguments as noted here to account for the reason why the corresponding procedures for testing H0:GORBP=1 or H0:GORBA=1 can preserve Type I error as well.

We define ρPA=ρP−ρA. To estimate ρPA, we consider the following linear combination of estimators:

(40)ρˆPA=l1log(GORˆ(1)/GORˆ(2))+l2log(GORˆ(3)/GORˆ(5))+l3log(GORˆ(6)/GORˆ(4)),

where lk are constants. We let V1, V2 and V3 denote the variances Var(log(GORˆ(1)/GORˆ(2))), Var(log(GORˆ(3)/GORˆ(5)))and Var(log(GORˆ(6)/GORˆ(4))), respectively. On the basis of eqs (37)–(39), we want to find constants lkto minimize the variance Var(ρˆPA)=l12V1+l22V2+l32V3, while these constants lkare subject to constraints 2l1+l2+l3=0 and l1+l2=1 so that ρˆPA is a consistent estimator forρPA(=ρP−ρA). Using Lagrange multiplier, we obtain

l1=(V2−V3)/(V1+V2+V3)

l2=(V1+2V3)/(V1+V2+V3),and

(41)l3=−(V1+2V2)/(V1+V2+V3).

Note that the weights (41) are function of unknown variances Vk. We may simply substitute Vˆk for Vk in (41) to obtain the estimated optimal weights lˆk, where Vˆk=(1/f11k+1/f12k+1/f21k+1/f22k) for k = 1, 2, 3. We can then employ this resulting consistent estimator for ρPA together with its estimated variance Varˆ(ρˆPA)=lˆ12Vˆ1+lˆ22Vˆ2+lˆ32Vˆ3 to test H0:ρPA=0 if one should decide to do so. Similar discussions as above can be done for studying ρPB=ρP−ρB and ρAB=ρA−ρB.

References

1. Senn S. Cross-over trials in clinical research, 2nd ed. Chichester: Wiley; 2002.10.1002/0470854596Search in Google Scholar

2. Grizzle JE. The two-period change-over design and its use in clinical trials. Biometrics. 1965;21:467–480.10.2307/2528104Search in Google Scholar

3. Hill M, Armitage P. The two-period cross-over clinical trial. Br J Clin Pharmacol. 1979;8:7–20.10.1111/j.1365-2125.1979.tb05903.xSearch in Google Scholar PubMed PubMed Central

4. Fleiss JL. The design and analysis of clinical experiments. New York: Wiley; 1986.Search in Google Scholar

5. Lui K-J, Chang K-C. Hypothesis testing and estimation in ordinal data under a simple crossover design. J Biopharm Stat. 2012;22:1137–1147.10.1080/10543406.2011.574326Search in Google Scholar PubMed

6. Jones B, Kenward MG. Modelling binary data from a three-period cross-over trial. Stat Med. 1987;6:555–564.10.1002/sim.4780060504Search in Google Scholar PubMed

7. Jones B, Kenward MG. Design and analysis of cross-over trials, 3rd ed. London: Chapman and Hall, Taylor & Francis Group; 2014.10.1201/b17537Search in Google Scholar

8. Senn S. Cross-over trials in Statistics in Medicine: The first ‘25’ years. Stat Med. 2006;25:3430–3442.10.1002/sim.2706Search in Google Scholar PubMed

9. Gart JJ. An exact test for comparing matched proportions in crossover designs. Biometrika. 1969;56:75–80.10.1093/biomet/56.1.75Search in Google Scholar

10. Ezzet F, Whitehead J. A random effects model for binary data from cross-over trials. Appl Stat. 1992;41:117–126.10.2307/2347622Search in Google Scholar

11. Schouten H, Kester A. A simple analysis of a simple crossover trial with a dichotomous outcome measure. Stat Med. 2010;29:193–198.10.1002/sim.3771Search in Google Scholar PubMed

12. Lui K-J. Crossover designs: Testing, estimation, and sample size. New York: Wiley; 2016.10.1002/9781119114710Search in Google Scholar

13. Lui K-J, Chang K-C. Test non-inferiority (and equivalence) based on the odds ratio under a simple crossover trial. Stat Med. 2011;30:1230–1242.10.1002/sim.4166Search in Google Scholar PubMed

14. Lui K-J, Chang K-C. Exact sample size determination in testing non-inferiority under a simple crossover design. Pharm Stat. 2012;11:129–134.10.1002/pst.506Search in Google Scholar PubMed

15. Lui K-J. Notes on testing carry-over effects in continuous data under an incomplete block crossover design. Commun Stat Simul Comput. 2016.. . doi:10.1080/03610918.2016.1146764.Search in Google Scholar

16. Lui K-J, Chang K-C. Test and estimation in binary data analysis under an incomplete block crossover design. Comput Stat Data Anal. 2015;81:130–138.10.1016/j.csda.2014.07.017Search in Google Scholar

17. Lui K-J. Estimation of the treatment effect under an incomplete block crossover design in binary data –a conditional likelihood approach. Stat Methods Med Res. 2015. doi:10.1177/0962280215595434.Search in Google Scholar PubMed

18. Lui K-J, Chang K-C. Exact tests in binary data under an incomplete block crossover design. Stat Methods Med Res. 2016. doi:10.1177/0962280216638382.Search in Google Scholar PubMed

19. Ezzet F, Whitehead J. A random effects model for ordinal responses from a crossover trial. Stat Med. 1991;10:901–907.10.1002/sim.4780100611Search in Google Scholar PubMed

20. Agresti A. generalized odds ratios for ordinal data. Biometrics. 1980;36:59–67.10.2307/2530495Search in Google Scholar

21. Lui K-J. Notes on estimation of the general odds ratio and the general risk difference for paired-sample data. Biometrical J. 2002;44:957–968.10.1002/bimj.200290007Search in Google Scholar

22. Lui K-J. Statistical estimation of epidemiology risk. Wiley: New York; 2004.10.1002/0470094087Search in Google Scholar

23. Fleiss JL, Levin B, Paik MC. Statistical methods for rates and proportions, 3rd ed.. New York: Wiley; 2003.10.1002/0471445428Search in Google Scholar

24. Kenward M, Jones B. The analysis of categorical data from cross-over trials using a latent variable model. Stat Med. 1991;10:1607–1619.10.1002/sim.4780101012Search in Google Scholar

25. Senn SJ. Is the simple carry-over model useful? Stat Med. 1992;11:715–726.10.1002/sim.4780110603Search in Google Scholar

26. Fleiss JL. On multiperiod crossover studies. Biometrics (Correspondence). 1986;42:449–450.Search in Google Scholar

27. Fleiss JL. A critique of recent research on the two-treatment crossover design. Control Clin Trials. 1989;10:237–243.10.1016/0197-2456(89)90065-2Search in Google Scholar

28. Agresti A. Categorical data analysis. New York: Wiley; 1990.Search in Google Scholar

29. Ne B, Ne D. Statistical methods in cancer research, volume 1. The analysis of case-control studies. Lyon: International Agency for Research on Cancer; 1980.Search in Google Scholar

30. Gart JJ. Point and interval estimation of the common odds ratio in the combination of 2x2 tables with fixed margins. Biometrika. 1970;57:471–475.10.1093/biomet/57.3.471Search in Google Scholar

31. SAS Institute, Inc. User’s Guide, 2nd ed. Cary, NC: SAS Institute; 2009.Search in Google Scholar

32. Lui K-J, Chang K-C, Lin C-D. Testing equality and interval estimation of the generalized odds ratio in ordinal data under a three-period crossover design. Stat Methods Med Res. 2015. doi:10.1177/0962280215569623.Search in Google Scholar PubMed

33. Freeman P. The performance of the two-stage analysis of two-treatment, two-period cross-over trials. Stat Med. 1989;8:1421–1432.10.1002/sim.4780081202Search in Google Scholar PubMed

34. Senn SJ. Cross-over trials, carry-over effects and the art of self-delusion. Stat Med. 1988;7:1099–1101.10.1002/sim.4780071010Search in Google Scholar PubMed

35. Clayton D, Cuzick J. Multivariate generalizations of the proportional hazards model. J R Stat Soc Ser. 1985;148:82–117.10.2307/2981943Search in Google Scholar

36. Fleiss JL. On the asserted invariance of the odds ratio. Br J Prev Soc Med. 1970;24:45–46.10.1136/jech.24.1.45Search in Google Scholar PubMed PubMed Central

37. Mosteller F. Association and estimation in contingency tables. J Am Stat Assoc. 1968;63:1–28.10.1080/01621459.1968.11009219Search in Google Scholar

Published Online: 2017-2-3

Articles in the same Issue

https://doi.org/10.1515/ijb-2016-0069

Keywords for this article

generalized odds ratio; incomplete block; crossover trial; ordinal data; Mantel-Haenszel test