Adaptive Design for Staggered-Start Clinical Trial

Ao Yuan; Qizhai Li; Ming Xiong; Ming T. Tan

doi:10.1515/ijb-2015-0011

Article Publicly Available

Adaptive Design for Staggered-Start Clinical Trial

Ao Yuan , Qizhai Li , Ming Xiong and Ming T. Tan

Published/Copyright: December 15, 2015

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal The International Journal of Biostatistics Volume 12 Issue 2

Abstract

In phase II and/or III clinical trial study, there are several competing treatments, the goal is to assess the performances of the treatments at the end of the study, the trial design aims to minimize risks to the patients in the trial, according to some given allocation optimality criterion. Recently, a new type of clinical trial, the staggered-start trial has been proposed in some studies, in which different treatments enter the same trial at different times. Some basic questions for this trial are whether optimality can still be kept? under what conditions? and if so how to allocate the the coming patients to treatments to achieve such optimality? Here we propose and study a class of adaptive designs of staggered-start clinical trials, in which for given optimality criterion object, we show that as long as the initial sizes at the beginning of the successive trials are not too large relative to the total sample size, the proposed design can still achieve optimality criterion asymptotically for the allocation proportions as the ordinary trials; if these initial sample sizes have about the same magnitude as the total sample size, full optimality cannot be achieved. The proposed method is simple to use and is illustrated with several examples and a simulation study.

Keywords: adaptive design; allocation proportion; optimality; sequential design; staggered-start

1 Introduction

In phase II and/or III clinical trial, there are several competing treatments to be assessed during the trial process. During the trial, each coming patient is allocated to one of the treatments according to some pre-specified rule, which is based on the current clinical knowledge of the treatment performances of the previously allocated patients up to that time. The aims are to assess the treatment performances well and minimize the the overall treatment losses to the patients. The rule for the allocation is called the design of a clinical trial and the minimal loss is often characterized by some optimality criteria. If the design involves unknown parameter(s) to be estimated and updated along with the trial progression, it is called adaptive design. Optimal adaptive clinical trial designs and their properties have been studied extensively in the literature, for example by Rosenberger and Lachin [1], Eisele [2], Melfi and Page [3], Rosenberger et al. [4], Bai et al. [5], Bai and Hu [6], Zhang et al. [7], Yuan and Chai [8], Zhu and Hu [9], Yuan et al. [10], among others. In all these existing trials, the treatments under investigation enter the trial at the same time. Recently in phase II (or phase III) clinical trials, a new type of trial, the staggered-start clinical trial has been proposed and studied by a number of researchers/experimenters [11–13] etc.). In this trial the treatments enter the trial at different times, as in the studies by Kublin [14], Cummings et al. [15], Hendrix et al. [16], and the experimental studies at the Fred Hunchinson Cancer Research Center. Staggered-start and delayed withdrawal designs were originally proposed by Leber and others as a way of providing evidence of disease modification without relying on biomarkers. In this study patients are randomized to drug or placebo at baseline and after a pre-specified period the placebo group is treated with the active agent. If the staggered-start group “catches up” with the group receiving treatment from the onset of the trial, no disease modification affect has been observed and the agent is deemed to have symptomatic benefits only. If the staggered-start group does not catch up with the group receiving treatment from the onset, then it is concluded that disease modification has occurred [14].

As an example for such trial, in the vaccine trial study [14], vaccine 1 and placebo are in the trial, after 12 months another new treatment vaccine 2 is available and joined the ongoing trial, and a third vaccine joined at a latter time. Thus, three vaccine regimens vs a shared placebo group are investigated in their trials. The trial is aimed to address the question: How should a lag in the availability of a vaccine product be accommodated in the research trial design? There are variety of other reasons for such type of trials, for example, by joining the new treatment(s) and the one(s) already under study, it can save the sample size and costs as compared to running separate trials. Also some times it is desirable to compare the behavior of the newly available treatment(s) to the ongoing one(s) in the same trial study.

In such staggered-start clinical trial there are some basic questions, such as how to allocate the coming patients in a reasonable way? how different this type trial compared to the existing ones in basic properties, will some properties be kept with this new trial and under what conditions? A simple way in this trial is to allocate all the coming new patients to the new treatment(s) and letting the existing one(s) in waiting until all the treatments have roughly the same number of patients, then one can study the treatments like in the standard trials. But this method is known to be undesirable, as the subjects in study is human being, any design should aim at allocating more patients to treatment with best efficacy and minimize the sample size as possible; also the deterministic assignment in the waiting period violates the basic principle of clinical trial. Since the existing treatments have been in the study for relatively long time, they can not be treated as initial observations in relation to the newly added treatments. In this type of trial, we want to know if it is still possible to allocate the coming patients to the treatments to achieve a given optimality criterion, or to minimize the treatment losses as at least in some extent. These pose new problems in clinical trial study.

For standard clinical trial, the adaptive design uses accumulating data to update aspects of the study as it continues without undermining the validity and integrity of the trial [3, 17, 18]; among others). The optimal design is to achieve some targeting objective criteria for the allocation proportions [2, 4, 7, 8, 19, 20]. These methods have various merits in application, but it seems none of the existing methods addressed the case of staggered-start trial.

Here we propose and study a class of adaptive designs for the mentioned type of clinical trials, in which weights are used in the assignment of patients to the treatments to balance the current allocation proportions, in such a way that the allocation proportions of the treatments will asymptotically achieve, under suitable conditions, some given optimality criterion as in the standard trial. This design is applicable to any type of responses and simple to use. Our initial findings for such design are that if the sample sizes of the ongoing treatments are relatively small as compared to those of the newly joined treatments, then asymptotically the trial can achieve the given optimality criterion as in the standard trial with all treatments starting at the same time; otherwise the optimality can only be partially achieved in some sense. In Section 2, we give a brief review of some related commonly used existing methods for standard adaptive clinical trials. Section 3 describes the proposed allocation rule for staggered-start trial; investigates its asymptotic optimality behavior in Section 4, by distinguish two cases: the initial sample sizes are not too large relative to the total sample size, so that asymptotic optimality can be achieved; the initial sample sizes has roughly the same magnitude as the total sample size and asymptotic optimality cannot be achieved. Lastly, in Section 5 we illustrate these methods by some examples and a simulation study. A short discussion is given at the end. The relevant technical proofs are given in the Appendix.

2 Brief review of existing methods for standard clinical trials

Assume a standard clinical trials with k treatments. The patients come to the trial sequentially. Let rn be the response (may be multiple) of the n-th patient under the assigned treatment, f(rn) be the summary score for this response. Without loss of generality we assume 0≤f(⋅)<∞, and bigger value of f(⋅) corresponds to better treatment effect. Let Ai be the event that the trial is under treatment i, μi=E(f(rn)|Ai) be the expected performance, or success rate, of the i-th treatment, σi2:=Var(f(r1)|Ai)<∞ be its variance (i=1,...,k). Denote μ=(μ1,...,μk) and σ2=(σ12,...,σk2). Denote the number of patients assigned to treatment i at time n by Ni(n), let N(n)=(N1(n),...,Nk(n)). The vector of allocation proportions N(n)/n is the main focus in phase III clinical trial studies, various methods are studied targeting the proportions to achieve, asymptotically, some specified optimality criteria.

For a vector x=(x1,...,xk), define |x|=x1+⋯+xk, and for x and y of the same length, define x−y=(x1−y1,...,xk−yk). The optimality criterion can be characterized by a specified object functional G(⋅,⋅) of the response distributions under the treatments, or of their means μ. The target allocation proportions is the vector v=(v1,...,vk) which achieves the given criterion

v=argsupv∈∇G(μ,v(μ)),

where ∇={v(⋅)=(v1(⋅),...,vk(⋅)):vj(⋅)>0(1≤j≤k),|v|=1.} is the set of all possible vectors of proportions. In the optimal adaptive design, a special randomization rule is constructed such that the design will achieve the asymptotic optimality

(1)N(n)n→v(a.s.).

Some examples of optimality criterion G(μ,v(μ)) are given below.

Let Iji be the indicator that the j-th patient is assigned to treatment i. Since μ is unknown, we plug in its current estimate μn, with

μni=∑j=1nf(rj)Iji/∑j=1nIji,(1≤i≤k).

More generally, we may consider the conditional distribution Fi(⋅) of the f(rn)|Ai’s. Let F(⋅)=(F1(⋅),...,Fk(⋅)). Their empirical versions are

Fn,i(xi)=∑j=1nχ(f(rj)≤xi)Iji/∑j=1nIji,(i=1,...,k)

where x=(x1,...,xk), and let Fn(x)=(Fn,1(x1),...,Fn,k(xk)), and χ(⋅) is the indicator function.

Example 1

The functional G( μ ,v( μ ))=∑j=1kμjvj( μ )= μ 'v( μ ) to be optimized is the average performance of all the treatments.

Example 2

Consider the case of integer response, let q be the allocation proportions and qn be its empirical version, so μ=q and μn=qn. Let ci>0 be the (known) cost for the failure of the i-th treatment, and I={Iij:1≤i≤n;1≤j≤k}, a=(1,...,1)′. The average weighted loss after the n-th split is

Ln(I,v)=∑i=1n∑j=1kvj2cj(1−qj)Iij/n,v∈∇.

Here we use a quadratic loss on v instead of a linear one, since the latter will assign all the weight to the smallest component. The object functional is the negative asymptotic risk

G(q,v(q))=−limnEPLn(I,v)=−limn∑j=1kvj2cj(1−qj)EP(Nj(n)/n)=−∑j=1kvj2cj(1−qj)v1,j.

Example 3.

Minimax design. Consider the integer response case (k>1) and, for computational convenience, the weighted loss function,

L(I,w)=∑i=1n∑j=1kwj(1−qj)2Iij/n,w∈W,

where W={w:0<wi,(1≤i≤k);∑i=1kwi=1.}. The risk is asymptotically

R(q,w)=∑i=1kwi(1−qi)2v1,i.

For fixed w, by Lagrange’s multiplier method, the maximum asymptotic risk is attained at q∗ with

1−qi∗=(k−1)/wiv1,i∑j=1k(wjvj)−1,(i=1,...,k).

Plugging in this value we get the maximum asymptotic risk

R(q∗,w)=(k−1)2∑i=1k(wiv1,i)−1−1.

To minimize this maximum asymptotic risk, use Lagrange method again, we get

wi∗=v1,i−1/2/∑j=1kv1,j−1/2,(i=1,...,k).

Now we choose

G(q,w(q))=−R(q∗,w∗)=−(k−1)2∑i=1kvi−1/2−2.

However, in many situations, we are also interested in optimizing the evaluations of the treatments along with the optimization of the assignment proportions. Let ξi=(ξi1,...,ξik) be the current evaluation of the k treatments if the coming patient is assigned treatment i, and Xn=(xn1,...,xnk) be the cumulative evaluations of the ξi’s for the k treatments up to time n, it is called the urn composition at time n. Denote |Xn|=∑j=1kxnj. For example, at time m, we can choose the ξi’s just be vm−1, independent of which treatment i is assigned at this time, see the Application section for the construction of the vm−1’s. More generally, we can only require the ξi be random with mean v, or to be of some more general nature. A joint optimal assignment for N(n)/n and Xn/n to achieve the same criterion v is a version of the generalized Pólya urn (GPU) design (for example [7, 8]). In this design, to assign the coming n-th patient to one of the treatments, a random variable is drawn from the multinomial distribution with probabilities Xn/|Xn|. If it is of type i, the patient is assigned to the i-th treatment, a random vector of masses ξi is added to the current urn compositions Xn, and the response rn is used to update the estimate of μ in the next step. Let Iξ=(ξij)i,j=1,...,k be the matrix representation of the ξij’s, and for each n, and Iξn be an i.i.d. version of Iξ. To simplify the expressions of the asymptotic variances to be derived later, we assume throughout this article that Iξ is independent of the response observations. The random vector ξi is termed the adding rule, and M=E(Iξ)=(mij) the design matrix with mij=E(ξij) (known). Recall that a left eigenvector of a matrix M corresponding to an eigenvalue λ is the row vector v such that vM=λv (in contrast a right eigenvector u is a column vector such that Mu=λu). The vector is normalized if |v|=1. The first eigenvalue λ of the design matrix and its normalized first left (row) eigenvector v plays a key role in the asymptotic properties of the GPU design. Many authors (for instance [21–23] studied asymptotic properties of Xn and N(n), and proved, under suitable conditions, that

(Xn/|Xn|,N(n)/n)→(v,v) (a.s.), or (Xn/n,N(n)/n)→(λv,v) (a.s.)

This give a joint optimality of (Xn/n,N(n)/n). For a comprehensive review in this field, see Rosenberger and Lachin [1] and other related recent papers.

However, the existing GPU design can only achieve joint optimality of (Xn/n,N(n)/n) for the same criterion v. More generally, we may optimizing Xn/n and N(n)/n by different criteria, which will give us more flexibility. The compound adaptive GPU design [10] can achieve this goal, we won’t pursue it here.

3 The staggered-start trial and the proposed design

In this case k treatments enter the trial at different times. After treatment 1 under study with n1 patients, a new treatment 2 is added into the trial,..., after treatment (k−1) under study with nk−1 patients, the newest k-th treatment enter the trial, with some initial size nk,0. The previous treatments already under way for long times, and the sample sizes n1,...,nk−1 cannot be viewed as initial sample sizes. The questions are can the optimality be achieved in this trial? under what conditions? how to allocate the incoming patients to achieve the overall optimality? It is known that any deterministic assignment is not desirable, so we need a randomized design for this problem. Below we describe the GPU designs for two treatments, three treatments, ..., k treatments accordingly. The general description may seem involved, the simulation example in Section 5 will make it simple and clear with two treatments. For easy of understanding, we first describe the deign for two treatments, and then for the case of general k treatments.

3.1 Two treatments

At the 1st stage, only one treatment in the trial with sample size n1(1). When the second stage begins, the second treatment enters the trial with an initial sample size n2,0. The optimality under consideration can be summarized by a 2-dimensional vector v(2), the urn composition at time n(≥n1(1)+n2,0) is Xn,2=(Xn,1,Xn,2), the adding matrix is M2 (2×2 dimesional). Apparently we need to allocate more patients to the second treatment to allow its number of patients to catch up quickly. For this, at time n=n1+n2,0, define a weight as Wn,2=(Wn,1,Wn,2), with Wn,1=n2,0, Wn1,2=n1. Define Wn,2Xn,2=(Wn,1Xn,1,Wn,2Xn,2), allocate the next (n+1)-th coming patient to one of the two treatments with probability Wn,2Xn,2/|Wn,2Xn,2|, and update the weight Wn,2 to Wn+1,2 as Wn+1,1=n2,0+1, Wn+1,2=Wn,2, update Xn to Xn+m the same way as before, ..., allocate the (n+m)-th patient to one of the two treatments with probability Wn+m,2Xn+m,2/|Wn+m,2Xn+m,2|, and update the weight Wn+m−1,2 to Wn+m,2 as below: if Wn+m−1,1<Wn+m−1,2, increase Wn+m−1,1 by 1 and Wn+m−1,2 be kept unchanged, otherwise increase Wn+m−1,2 by 1 and Wn+m−1,1 be kept unchanged; when Wn+m−1,1=Wn+m−1,2 reach uniform, we stop updating the weight vector and it stays uniform from then on. Update Xn+m−1,2 to Xn+m,2 the same way as before, and then allocate the next (n+m+1)-th patient to one of the treatments by the probability Xn+m+1,2/|Xn+m+1,2|,...,until the end of the trial.

Note that in the above we may define the weights as Wn+1,2=(n2,0+1n,n1n). Since we only need the ratio Wn,2Xn,2/|Wn,2Xn,2| for the allocation probability, which result the same as the above defined weights, and also will be easier in our latter computation.

3.2 k Treatments

For the firs two treatments in the trial, the procedure is the same as the above case. But to accommodate k treatments at latter stages, notationally we extend the vectors v2, Xn,2 and M2 to their k-dimensional version as v(2), Xn(2) and M(2), by adding zero components.

In the j-th stage (j≥2), the j-th treatment enters the trial. At this time the previous j−1 treatments already have been allocated to n1(j),...,nj−1(j) patients. The optimality under consideration can be summarized by a j-dimensional vector vj, with an urn composition at time n be Xn,2=(Xn,1,0...0,Xn,j) and a j×j adding matrix Mj. Notatiobally we extend vj, Xn,j and Mj to their k-dimensional version as v(j), Xn(j) and M(j) by adding zero components for j<k. The urn composition at time n denoted as a k-vector Xn(j), which is generated the way described before. Let n1[j]≤⋯≤nj−1[j] be the ordered version of n1(j),...,nj−1(j). At the beginning of the j-th stage, let Wn[j]=(Wn,1[j],...,Wn,j[j],0...,0)=(n1[j],...,nj−1[j],n1+⋯+nj−1,0...,0) and Wn(j) be the concomitant of Wn[j] in relationship of (n1(j),...,nj−1(j),n1+⋯+nj−1,0...,0) and (n1[j],...,nj−1[j],n1+⋯+nj−1,0...,0). Allocate the next patient to one of the k treatments by the probability Wn(j)Xn/|Wn(j)Xn|, and increase n1[j] by 1, and re-define n1[j]≤⋯≤nj−1[j] after this step... Obviously, as n increases, Wn tends to the uniform weight, and when Wn/|Wn|=(1/k,...,1/k), we stop updating it.

At the k-th stage, there are already (k−1) treatments under study with sample sizes n1(k),...,nk−1(k), and then the k-th treatment enters the trial. The optimality criterion is described by the k-vector v=v(μ). Let n=n1+⋯+nk−1. The urn composition at time n denoted as a k-vector Xn, which is generated the way described before. Let n1[k]≤⋯≤nk−1[k] be the ordered version of n1(k),...,nk−1(k). At the beginning of the k-th stage, let Wn[k]=(Wn,1[k],...,Wn,k[k])=(n1[k],...,nk−1[k],n1+⋯+nk−1) and Wn(k) be the concomitant of Wn[k] in relationship of (n1(k),...,nk−1(k),n1+⋯+nk−1) and (n1[k],...,nk−1[k],n1+⋯+nk−1). Allocate the next patient to one of the k treatments by the probability Wn(k)Xn/|Wn(k)Xn|, and increase n1[k] by 1, and re-define n1[k]≤⋯≤nk−1[k] after this step... Obviously, as n increases, Wn tends to the uniform weight, and when Wn/|Wn|=(1/k,...,1/k), we stop updating it.

It is seen that by this way, the newly added treatments have a bigger chance to get incoming patients, and with more patients enters the trial, the weights will gradually even out. So, the rule describe above will allow the currently under assigned treatment(s) to quickly catch up while those over assigned treatment(s), so we expect the allocation proportions N(n)/n will finally converges to the optimality object v(μ), and essentially the allocation is determined by the the adding matrix M, and eventually the optimality will be achieved in the way as the standard trial.

4 Asymptotic properties

Now we study the asymptotic properties of this design for quantitative responses, the case of qualitative responses should be parallel. We distinguish two cases, first the case n1,...,nk−1 are large but nj/n→0 as n→∞; and secondly the case limnj/n>0(j=1,...,k−1). The asymptotic results are different for the two cases.

Case I. We first consider the case the initial sample sizes nj’s are not too big relative to the total sample size n, and show that optimality can be achieved as in the corresponding standard clinical trials.

Let →D denote convergence in distribution. For vectors a=(a1,...,ak) and b=(b1,...,bk) with bi≠0(i=1,...,k), define a/b=(a1/b1,...,ak/bk), and denote {a} the diagonal matrix for a. We impose the following conditions

0<νij<∞, (1≤i,j≤k).
M1'=λ11'.
ςij(⋅,⋅) is continuous at (μ,y) for any y and there is an r>2 such that E||ξ1||r<∞ and supnE(||ζn||r|Fn−1)<∞, a.s.
M(μ)≥0(≠0) is differentiable, and there is a δ>0 such thatM(y)−M(μ)=∑j=1k∂M(y)∂yj|y=μ(yj−μj)+O(||y−μ||1+δ), as y→μ.
∏j=1k−1nj=o(n1/2).

The following results show that under assumption (B4), the staggered-start trial achieves the optimality criterion and with asymptotic behavior of Xnn,N(n)n be the same as those for the standard trial.

Theorem 1

Assume (B0)-(B2) (for somer>1) and (B4), then we have

μn→μ,a.s.;supx||Fn(x)−F(x)||→0,a.s.
(Xn( μ n−1)n,N(n)n)→(λ1v( μ ),v( μ ))a.s.
∑i=1k∑j=1nf(rj)Ijin→v( μ ) μ ',a.s.

Theorem 2

(i) Assume (B0)-(B2) and (B4), we have

n(μn−μ)→dN(0,Ωμ),n(Fn(x)−F(x))→DN(0,ΩF),x∈S(F),

whereΩμ={σ2/v}, andΩF={F(x)(1−F(x))/v}.

(ii) Assume (B0)-(B4). Ifλ1>2Re(λ2), then

n(Xn( μ n−1)n−λ1v( μ ),N(n)n−v( μ ))′→DN(0, Ω ),

whereΩis given in the proof.

Remark 1) Whenλ1≤2Re(λ2), the convergence rate is generally slower thann1/2.

2). The asymptotic covariance matrixΩcan be made very simple by choice ofM, as in the Corollary in Yuan and Chai [8], as below. Let v1(⋅)be twice differentiable with|v1(⋅)|=1. SetM(⋅)=1'v1(⋅), Ξbe a constant matrix, then the conditions of Theorem 2(ii) are satisfied withλ1=1andλ2=0, andΩis simplified as

Ω 11=2 Σ v, Ω 12=3 Σ v(I−1'v1),

Ω 22={v1}−v'1v1+6(I−v'11) Σ v(I−1'v1),

whereIis thek-dimensional identity matrix, Σ v=(∂v1( μ )/∂ μ )′diag(σ12/v11,...,σk2/v1k)∂v1( μ )/∂ μ ). Especially, when takeζ=Const., thenE( ζ '{v1} ζ )−v'1v1=0andΩij’s can be simplified further.

Case II. In this case, limni/n→αi>0(i=1,...,k−1), but since the stopping time n is unknown in advance, the proportions α1,...,αk−1 are only known when the trial terminates, otherwise the design can be made simpler by using the αj’s. We will see that in this case, the optimality criterion can only be partially achieved, in that the targeting limits are partially kept unchanged, as seen by comparing results of Theorems 3 and 4 vs those in Theorems 1 and 2.

Theorem 3

Assume Mj=1'jv(j), then

μn→μ,a.s.; supx||Fn(x)−F(x)||→0, a.s.
with (y,u) given in the proof,
Xn(μn−1)n,N(n)n→(y,u)a.s.,
∑i=1k∑j=1nf(rj)Ijin→v( μ ) μ ',a.s.

Theorem 4

Assume Mj=1'jv(j) and nj/n=αj+o(n−1/2), then

n(Xn( μ n−1)n−y,N(n)n−u)′→DN(0,Ω¯),

where Ωˉ is given in the proof.

In Theorem 3 (ii), (y,u)≠(v,v)). Thus, if the sample sizes nj’s are not small in relation to the total sample size n, full optimality cannot be achieved.

5 Application examples and simulation study

In practice v(⋅) should be chosen differentiable around μ≠0, and hence satisfies the required conditions. We will continue the three examples given in Section 2 and with two more examples in common applications, with a simulation study for the last example.

Example 1

(continued). The functional G( μ ,v(μ))=∑j=1kμjvj( μ )= μ 'v( μ ). Typically, the components of v assigns larger values to larger components of μ. By Theorem 3, the “success rate” is asymptotically

G( μ n,v( μ n))= μ 'nv( μ n)→ μ 'v( μ ),a.s.

To maximize this quantity over v(⋅), one may wish to assign all the mass of v to the largest component (or components, in case of ties) of μ, but this will generally fail the conditions for consistency, which usually requires vi(⋅)>0. So we assign vi(⋅)>0 by some fixed rule, and let the first left eigenvector to be: v1i(μn−1)=vi(μn−1)(1≤i≤k) and normalize it, still denote it by v(⋅).

Example 2

(continued). The object functional is the negative asymptotic risk

G(q,v(q))=−∑j=1kvj2cj(1−qj)v1,j.

We are to maximize the object functional over v∈∇ and v1∈∇. Using Lagrange’s multiplier, we get the theoretical optimal (v(q),v1(q)) with components

v1,i(q)=vi(q)=1ci(1−qi)∑j=1k1cj(1−qj)−1.(i=1,...,k)

Now, given qn−1=(qn−1,1,...,qn−1,k)′, we get the optimal design with

vn−1,1,i=1ci(1−qn−1,i)∑j=1k1cj(1−qn−1,j)−1,(i=1,...,k).

Example 3.

(continued). The optimality target functional is

G(q,w(q))=−R(q∗,w∗)=−(k−1)2∑i=1kvi−1/2−2

and we are to find v1 to minimize this quantity. Using the Lagrange method again, we get v1 for the minimax design v1,i=1/k, (i=1,...,k), which is a fixed design.

Example 4

Consider two qualitative treatments. The Neyman allocation [3] is to allocate N1(n) patients to treatment I by maximizing the power for testing the difference Δ=p1−p2, where p=(p1,p2) are the success rates of the two treatments. Here f(⋅)≡1 and μ=p. This strategy leads to the allocation ratio

N1(n)N2(n)→p1(1−p1)p2(1−p2):=v11v12,

or v1(p)=(v11(p),v12(p)), with

v11(p)=p1(1−p1)p1(1−p1)+p2(1−p2), v12(p)=p2(1−p2)p1(1−p1)+p2(1−p2).

We have |v1|=1, and the corresponding optimal v1(⋅)=(v11(⋅),v12(⋅))′. The generating matrix M(⋅) can be chosen as in the Corollary, M(⋅)=v1(⋅)1'. Take λ1=1, and the ξn,ij’s be constants, the corresponding adding rule is ςni=v1(pn−1)(i=1,2). It is easily checked that all the conditions in Theorem 3 and the Corollary are satisfied, so we have

(Xn,1n,Xn,2n)→(v11(p),v12(p))←(Nn,1n,Nn,2n).a.s.

Note σ12=p1(1−p1) and σ22=p2(1−p2), so by the Corollary we have n(Xn/n,N(n)/n)→DN(0,Ω), with Ω=(Ωij)1≤i,j≤2, Ω12=Ω′21,

Ω11=2Σv=(1−2p1)2p2(1−p2)+(1−2p2)2p1(1−p1)2(p1(1−p1)+p2(1−p2))31−1−11,

Ω 12=3 Σ v(I−1'v1)=34(1−2p1)2p2(1−p2)+(1−2p2)2p1(1−p1)(p1(1−p1)+p2(1−p2))3(1,−1−1,1)

and

Ω 22={v1}−v'1v1+6(I−v'11) Σ v(I−1'v1)=(p1p2(1−p1)(1−p2)(p1(1−p1)+p2(1−p2))2+3[(1−2p1)2p2(1−p2)+(1−2p2)2p1(1−p1)]2(p1(1−p1)+p2(1−p2))3)(1−1−11).

Example 5.

The criterion in Rosenberger et al. [4] is to minimize the expected treatment failure and leads to

N1(n)N2(n)→p1p2:=v11v12 or v12(p)=1+p1p2−1,

so we take the targeting v1(⋅) as

v11(p)=p1p1+p2, v12(p)=p2p1+p2.

The f(⋅), μ, the generating matrix and the adding rule are the same as in Example 1, with v1 given here. The asymptotic results are similar, with

Ω11=(1−p1)p23/2+(1−p2)p13/22p1p2(p1+p2)31−1−11,

Ω12=34(1−p1)p23/2+(1−p2)p13/2p1p2(p1+p2)31,−1−1,1

and

Ω22=(3−p1)p23/2+(3−p2)p13/22p1p2(p1+p2)31−1−11

5.1 Simulation study

Below we perform a simulation study for Example 5. At the end of the 1st stage of the trial, treatment 1 has been under study with n1(1)=100 patients, with responses r1,n1(1)=(r11,...,r1,n1(1)). In the 2nd stage, treatment 2 enters the trial with initial patient size n2,0=15, with responses r2,n2,0=(r21,...,r2,n2,0). The responses for the two treatments are generated from N(μ1,1) and N(μ2,1) respectively, with μ1=2.5 and μ2=3. We define the success of a treatment as a response ≥2. Thus the success rates p=(p1,p2)=(0.6915,0.8413), and targeting allocation proportion vector is v1=(v11(p),v12(p)) given in Example 5.

When treatment 2 enters the trial, the total sample size is n1(1)+n2,0. General time n=n1+n2, where n1 is the total number of patients allocated to treatment 1 upto time n (including the n1(1) patients, and n2 is the total number of patients allocated to treatment 2 upto time n (including the n2,0 patients. We use the adding matrix with λ1=1 as in Remark 1 to make it simple to use, then urn composition at time n is Xn=(xn1,xn2)′, with xn1=nv11(pn), xn2=nv12(pn), pn=(p1,n1,p2,n2), and

p1,n1=1n1∑i=1n1I(r1i≥2),p2,n2=1n2∑i=1n2,I(r2i≥2)

are the empircal estimates for p1 and p2, I(⋅) is the indicator function. For the weight, when treatment 2 first enters the trial, Wn=(wn1,wn2)=(n2,0,n1(1)). Denote Qn=WnXn/|WnXn| (recall for a vectors a=(a1,...,ak) and b=(b1,...,bk), ab=(a1b1,...,akbk), and |a|=a1+⋯+ak), then

Qn=(wn1v11(pn),wn2v12(pn))wn1v11(pn)+wn2v12(pn).

When the next (n+1)-th patient comes, allocate this patient to treatment 1 or 2 according to the probability Qn.

After this patient been allocated, record his/her response r2,n2,0+1, if he/she is allocated to treatment 2 (or r1,n1(1)+1, if he/she is allocated to treatment 1), then update pn to pn+1 accordingly. Update the weight Wn to Wn+1=(wn+1,1,wn+1,2)=(n2,0+1,n1(1)), and update Qn to Qn+1 accordingly with (Wn,pn) replaced by (Wn+1,pn+1). Then allocate the (n+2)-th coming patient to one of the two treatments by probability Qn+1,...., continue this way until wn1=wn2, then Wn is fixed, and then Qn=(v11(pn),v12(pn)) after this n. We continue allocate the each coming patient with probability Qn, and update it (along with the updating of pn) after each allocation, till the end of the trial.

The allocation proportion at time n is N(n)/n=(n1,n2)/n, where n1 is the total number of all the patients allocated to treatment 1 at time n (including the n1(1) patients in the first stage), and n2 is the total number of patients allocated to treatment 2 at time n (including the n2,0 initial patients). We want N(n)/n→v(p) as n increases. The results are presented in Table 1.

We see that when treatment 2 enters the trial, the allocation proportion is highly skewed to treatment 1. As new patients come in, with the weighted allocation design, the proportion catches up slowly; when the total sample size surpass 300, the difference between two proportions becomes not so significant, when sample size reaches 1,000, the two proportions are close; and when it surpasses 2,000, the proportions begins to shift toward treatment 2, as this treatment has higher success rate. Although in most phase III clinical trail, sample size rarely surpass 5,000, this simulation experiment gives us some sense for how fast the asymptotic results for staggered-start trial can stabilize to its optimal target.

Table 1:

Allocation proportion with simulated data.

n	n1/n.	n2/n
115	(0.86957,	0.13043)
150	(0.70667,	0.29333)
200	(0.62500,	0.37500)
300	(0.56667,	0.43333)
500	(0.55400,	0.44600)
700	(0.51429,	0.48571)
900	(0.50889,	0.49111)
1,000	(0.51900,	0.48100)
2,000	(0.50750,	0.49250)
5,000	(0.48040,	0.51960)
8,000	(–.46825,	0.53175)
10,000	(0.47430,	0.52570)
20,000	(–.47085,	0.52915)
50,000	(0.47708,	0.52292)
v(p)=	(0.47550,	0.5245)

6 Discussion

In this study, we studied adaptive design for staggered-start clinical trial. We used weighted assignment to accelerate the allocation of patients to the newly entered treatment(s). The choice of our weight is intuitive, but different weights can be used. For example, put more heavy weight on the new treatment(s) to let their sample size(s) catch up more quickly. An extreme is to allocate all incoming patients to the new treatment until it has about the same sample size as the early started one(s). As mentioned before, this type of design with a deterministic portion violates the basic principle in clinical trial. Another extreme is allocate the incoming patients using the classical rule, just as all the treatments are started at the same time. As we pointed out, and seen from the simulation example, this design can not achieve, or approximate a given target allocation proportion. So a natural question is: is there an optimal weight for the staggered-start trial? This question seems not simple. First, one should to have a criterion as for optimal in what sense. It should also depend on the number of treatments, how many of them are late started, the sample sizes for the early started treatments, etc. This question can be a good research topic in our future studies.

The choice of the design criteria is an old topic. We listed five commonly used such criteria, and there are many more in application and research literature, a comprehensive review on this can be found in, for example, Rosenberger and Lachin [1], and Jennison and Turnbull [18], Chapter 17).

Another interesting question is: what is a cutoff value on when it is not suggested to enter a new treatment to staggered-start clinical trials? Again this question is not simple. It will depend on the motivation and plan of the trial process. In terms of achieving, or approximating, the targeted allocation proportions, based on our intuition and simulation results, it seems that if the ongoing trial is already under way more than one third of the whole planned process, i.e., the number of patients in the trial already surpassed one third of the planned total number of patients, it is not adequate to start a new treatment to join the trial. However, this is just a referential suggestion, the actual decision should depend on many more technical and ethical factors.

Appendix

Let N0=n1+⋯+nk−1+1, In=(In1,...,Ink) and Fn=σ(r1,...,rn; I1,...,In) be the sigma filed generated by the underlined variables.

Proof of Theorem 1:

(i) We show μni→μi (a.s.) for each i=1,...,k. In fact,

μni=∑j=1nf(rj)Iji∑j=1nIji=1Ni(n)∑j=1Ni(n)f(rj|Ai),

thus μni→μi (a.s.) as long as Ni(n)→∞ (a.s.) which we show below.

We only need to consider n>N0, so the Wn’s and Xn’s are all k-dimensional. By the construction of Wn, as n increases to n+1, only the smallest component of Wn gets one increment, the others stay unchanged. Since |Wn|=n, so for n>N1, Wn/n will be uniform, i.e., Wn,i/n=1/k(i=1,...,k); where N1:=∑j=1k−1∏i=jk−1n(i) and n(1)≤⋯≤n(k−1) is the ordered sample sizes of n1,...,nk−1.

Note that the sequence of the adding components to Xn after centering, {ζn−E(ζn|Fn−1): n≥N0} is a matrix martingale, with

∑n=N0∞E||ζn−E(ζn|Fn−1)||r|Fn−1)(n−N0)r≤∑n=1∞C(n−N0)r<∞

by (B2). So by the law of large numbers for martingales,

1n−N0∑m=N0nζm−1n−N0∑m=N0nE(ζm|fm−1)→0(a.s.).

Since E( ζ m|Fm−1)=M( μ m−1), we have ∑m=N0n ζ m=∑m=N0nM( μ m−1)+o(n−N0)1'1=O(n−N0)1'1+o(n−N0)1'1 (a.s.). Since components of M(μ) are positive, ∑m=N0nM( μ m−1)=O(n−N0)1'1 (a.s.), so |Xn|=O(|XN0|+∑m=N0+1nk−1|| ζ m||)=O(|XN0|+∑m=N0+1nk−1||M( μ m−1)||+o(n−N0))=O(n−N0) (a.s.), where ||M(⋅)||=∑i,jmij(⋅). Hence ∑n=N0∞|Xn|−1=∞ (a.s.) and so, for some c>0,

∑n=N0∞P(Ini=1|fn−1)=∑n=N0∞Wn−1,iXn−1,i/|Wn−1Xn−1|

≥c∑n=N0∞Xn−1,i(1k+O(1n−N0))/(|Xn−1|(1k+O(1n−N0))≥cXN0−1,i∑n=N0∞|Xn−1|−1=∞,(a.s.)

this, with the generalized Borel-Cantelli lemma for martingale (Corollary 2.3 [24], implies that P(In,i=1,i.o.)=1, or Nn,i→∞ (a.s.).

For the second result, since Fn,i(x)=1Ni(n)∑j=1Ni(n)χ(f(rj≤x|Ai), and Ni(n)→∞ (a.s.), the result is direct from the uniform almost sure convergence of empirical distribution.

(ii). Define Sn=∑m=1nΔSm with ΔSk=Ikζk−E(Ikζk|fk−1). Then {Sn,Fn−1:n≥N0} is a martingale. The proof is modified from that in Zhang et al. [7]. Denote Mk=M(μk), and note E(Ikζk|fk−1)=Wk−1Xk−1|Wk−1Xk−1|Mk. We have

Xn=Xn−1+Inζn=XN0+∑m=N0+1nImζm=XN0+Sn+∑m=N0n−1WmXm|WmXm|Mm+1.

(2):=An+∑m=N0n−1o(m−1)Xm|Xm|Mm+1,(a.s.)

Similarly as in Zhang et al. [7], An=o(n−N0)1 (a.s.). Apparently, as Xm|Xm|Mm+1 is bounded, the last term in eq. (2) is also o(n−N0)1 (a.s.), and so the claimed result is true.

Now we prove N(n)/n→v (a.s.). Define mn=∑k=1nΔmk, with Δmk=Ik−E(Ik|fk−1). Then {mn,Fn−1:n≥N0} is a matrix martingale, and E(Ik|cfk−1)=Wk−1Xk−1|Wk−1Xk−1|. Since N(n)=∑m=N0nIm, we have

(3)N(n)−(n−N0)v=∑m=N0nmn+∑m=N0nWmXm|WmXm|−(n−N0)v=∑m=N0nmn+∑m=N0nXm|Xm|−(n−N0)v+∑m=N0n−1o(m−1)Xm|Xm|(a.s.):=Dn+∑m=N0n−1o(m−1)Xm|Xm|,(a.s.)

and similarly as in Zhang et al. [7] we have Dn=o(n−N0)1 (a.s.), and with the fact ∑m=N0n−1o(m−1)Xm|Xm|=o(n−N0)1 (a.s.), we get the desired result.

(iii) Since

∑i=1k∑j=1nf(rj)Ijin=∑i=1kNi(n)n∑j=1nf(rj)IjiNi(n)

the conclusion is direct from those in (i) and (ii).

Proof of Theorem 2:

(i). With the result in Theorem 1 (ii), the proof is the same as that in Theorem 2 (i) in Yuan and Chai [8].

(ii). By eq. (2), we have

nXnn−λ1v=n−/2An+N0n−1/2λ1v+n−1/2∑m=N0n−1opm−1Xm|Xm|Mm+1.

Since N0 is fixed, N0n−1/2λ1v→0 as n→∞. Also, with N1 given in the proof of Theorem 1 (i), for n<N1, WnXn/|WnXn|=Xn/|Xn|(1+op(n−1); for n≥N1, Wn is uniform, so WnXn/|WnXn|=Xn/|Xn|, and so in the third term of the previous expression, the o(m−1) is actually 0 for such n’s. Thus n−1/2∑m=N0n−1op(m−1)Xm|Xm|Mm+1=n−1/2∑m=N0N1op(m−1)Xm|Xm|Mm+1→P0, since by assumption (B4), N1n−1/2→0.

Similarly, by eq. (3),

nNnn−v=n−/2Dn+N0n−1/2v+n−1/2∑m=N0n−1opm−1Xm|Xm|=n−/2Dn+op1.

So we get

nXnn−λ1v,Nnn−v=nAn,Dn+oP1

Note An is the asymptotic expansion of the urn composition corresponds to Xn without weighting Wn, and Dn is that for the allocation acounts N(n) without weighting Wn, so the result follows from Theorem 4 in Yuan and Chai [8].

To describe the asymptotic covariance matrix Ω, we need the following notations. Let M¯=M−1'v, ζ=ζ(μ,Ξ1), Σ 1={v}−v'v, Σ 2=E[( ζ −M)′{v}( ζ −M)], Σ3=diag(v1σ12,...,vkσk2), with σj2=Var(μn,i), and Σ 23=E[( ζ −M)′{v1}(f(r1)− μ )], where f(r1)=(f(r1)|A1,...,f(r1)|Ak). Define tj=v∂M(x)/∂xj|x=μ and T=(t'1/v1,...,t'k/vk)′. Recall for a real number a, positive number x and a matrix A, the following definitions will be adopted

eaA=∑i=0∞aii!Ai,xA=∑i=0∞logi(x)i!Ai.

Denote Ω=(Ωij)1≤i,j≤2

Ω 11=M' Λ 1†M+ Λ 2†+ Λ 3†+ Λ 23†+( Λ 23†)′, Ω 12=M' Λ 1†+( Λ 2⋄+ Λ 3⋄+ Λ 23⋄+ Λ 32⋄)(I−1'v),

and

Ω 22= Λ 1†+(I−v'1)( Λ 2#+ Λ 3#+ Λ 23#+( Λ 23#)′)(I−1'v),

with Λ1†=∫01(1x)M¯'Σ1(1x)M¯dx, Λ3†=∫01[∫x11y(1y)M¯dy]′T'Σ1T[∫x11y(1y)M¯dy]dx

Λ 2†=∫01(1x)M¯' Σ 2(1x)M¯dx, Λ 23†=∫01(1x)M¯' Σ 23T[∫x11y(1y)M¯dy]dx

Λ 2#=∫01[∫x11y(yx)M¯dy]′ Σ 2[∫x11y(yx)M¯dy]dx Λ 23#=∫01[∫x11y(yx)M¯dy]′T' Σ 23T[∫x11y(yx)M¯dy]dx Λ 3#=∫01[∫x1∫xy1yu(yu)M¯dudy]′T' Σ 3T[∫x1∫xy1yu(yu)M¯dudy]dx Λ 2⋄=∫01(1x)M¯' Σ 2∫x11y(yx)M¯dydx Λ 23⋄=∫01(1x)M¯' Σ 23T∫x11yu∫xy(yu)M¯dudydx Λ 32⋄=∫01[∫x11y(1y)M¯dy]′T' Σ '23∫x11y(yx)M¯dydx and Λ 3⋄=∫01∫x1[1y(1y)M¯dy]′T' Σ 3T∫x1∫xy1yu(yu)M¯dudydx

Proof of Theorem 3

(i). Let v(1)=(1,0,...,0), v(2)=(v1(2),v2(2),0,...,0),..., v(k−1)=(v1(k−1),...,vk−1(k−1),0). Let n1(j),...,nj(j) be the allocated number of the first j treatments at the start of the j-th stage, and denote the weight at stage j as Wn(j). The v(j)’s are all k-dimensional vectors, v(j) has j non-zero components which correspond to the optimality criterion for the first j treatments. Let M(j) be the adding matrix for the first j treatments, λ(j) be its first eigenvalue, and αl(j)=limnl(j)/nj(j). Denote X0=(x1,0,0,...,0). Let g1(v(1))=(1,0,...,0), for a k-vector x=(x1,...,xk), let g2(x) be the k-vector gievn by

g2(x)=(∫0α1(2)∧1(wx1,α1(2)x2)wx1+α1(2)x2dw+[1−(α1(2)∧1)]((x1,x2)x1+x2,0,...,0).

To define gj(x), we first re-arrange the first j components of Wn(j) in increasing order as Wn[j]=(n[1],...,n[j−1],n1+⋯+nj−1,0,...,0), with n[1]≤⋯≤n[j−1]. Define

g[3](x)=(∫α1(3)α1(3)+(1/2)∧α2(3)∫α2(3)α2(3)+(1/2)∧α1(3)(w1x1,w2x2,(α1(3)+α2(3))x3)w1x1+w2x2+(α1(3)+α2(3))x3dw1dw2

+[1−(α1(3)2∨α2(3)2)∧1]((x1,x2,x3)x1+x2+x3,0,...,0),

and then define g3(x) as g[3](x) with components switched back to the original order of Wn(3), i.e., g3(x) is the concomitant of g[3](x) in relationship of Wn(3) to Wn[3]. More generally, define

g[j](x)=(∫α1(j)α1(j)+(j−1)−1∧αj−1(j)⋯∫αj−1(j)α2(j)+(j−1)−1∧α1(j)(w1x1,...,wj−1xj−1,(α1(j)+⋯+αj−1(j))xj)w1x1+⋯+wj−1xj−1+(α1(j)+⋯+αj−1(j))xjdw1⋯dwj−1

+1−α1(j)j−1∨⋯∨αj−1(j)j−1∧1(x1,...,xj)x1+⋯+xj,0,...,0,(j=2,...,k)

and define gj(x) as the concomitant of g[j](x).

Lastly set u=∑j=1kαjgj(v(j)), and y=∑j=1kαjgj(v(j))M(j).

(ii). Let ΔSm and Δmm as in the proof of Theorem 1 (ii). Denote Sn(1)=∑m=1n1ΔSm:=∑(1)ΔSm, Sn(2)=∑m=n1+1n2ΔSm:=∑(2)ΔSm,..., Sn(k−1)=∑m=n1+⋯+nk−2+1n1+⋯+nk−1ΔSm:=∑(k−1)ΔSm, Sn(k)=∑m=n1+⋯+nk−1+1nΔSm:=∑(k)ΔSm.

Similarly as in the proof of Theorem 1 (ii), we have

Xnn−y=∑j=1knjn(1njSn(j)+1nj∑(Wm(j)Xm|Wm(j)Xm|−gj(v(j))Mm+1(j)+1nj∑gj(v(j))(Mm+1(j)−M(j)))+∑j=1kgj(v(j))(njn−αj)+n−1X0.

Since nj−1Sn(j)→0 (a.s.), Mm+1(j)−M(j)→0 (a.s.), n−1X0→0 (a.s.), nj/n→αj, and since for m with n1+⋯+nj−1≤m≤n1+⋯+nj, Xm/|Xm|=(Xm/m)/|Xm/m|→v(j)/|v(j)|=v(j) (a.s.). Take j=2 for example, when m varies from n1(2) to n2(2), the first two components of Wm(2) varies from (n2,0,n1(2)) to (n1(2)∧n2(2),n1(2)), n2,0 is fixed, the other components of Wm are zeros, so for any fixed k-vector x,

1n2∑(2)Wm(2)x|Wm(2)x|=n2(2)n21n2(2)∑(2)Wm(2)x|Wm(2)x|→g2(x),(a.s.).

To see how the form of gj(x) be identified, we first re-arrange the first j components of Wn(j) in increasing order as Wn[j]=(n[1],...,n[j−1],n1+⋯+nj−1,0,...,0), with n[1]≤⋯≤n[j−1]. For example, for Wn[3], at the start of the 3rd stage, when m varies from n1+n2 to n1+n2+n3, the first 3 components of Wm[3] varies from (n[1],n[2],n1+n2) to (n[1]+[n3/2]∧n[2],n[2]+[n3/2]∧n[1],n1+n2), and this gives 1n[2]∑(3)Wm[3]x|Wm[3]x|→g[3](x). Now it is seen that

1nj∑(j)Wm(j)Xm|Wm(j)Xm|−gj(v(j))=1nj∑(j)Wm(j)(Xm/|Xm|)|Wm(j)(Xm/|Xm|)|−gj(v(j))→0,(a.s.)

these give Xnn→u (a.s.).

Similarly,

Nnn−u=∑j=1knjn1njmnj+1nj∑(j)WmjXm|WmjXm|−gjvj+∑j=1kgjvjnjn−αj→0a.s..

Proof of Theorem 4:

(i) For the first part see the proof of Theroem 2.

(ii). Let h[j](w,x) be the integrand of g[j](x),

G[j](x)=(∫α1(j)α1(j)+(j−1)−1∧αj−1(j)⋯∫αj−1(j)α2(j)+(j−1)−1∧α1(j)(h[j](w,x))′h[j](w,x)dw1⋯dwj−1

+[1−(α1(j)j−1∨⋯∨αj−1(j)j−1)∧1]((x1,...,xj)′(x1,...,xj)(x1+⋯+xj)2,0,...,0),(j=2,...,k)

and G(j)(x) be the concomitant of G[j](x).

As in the proof of Theorem 3 (ii),

n(Xnn−y)=∑j=1k(njn)1/2(nj−1/2Sn(j)+nj−1/2∑(Wm(j)Xm|Wm(j)Xm|−gj(v(j)))Mm+1(j)+nj−1/2∑gj(v(j))(Mm+1(j)−M(j)))+∑j=1kgj(v(j))nj−αjnn+n−1/2X0=∑j=1k(njn)1/2(nj−1/2Sn(j)+1nj∑(mnj)−1/2m(Wm(j)(Xm/m)|Wm(j)(Xm/m)|−gj(v(j)))Mm+1(j)+nj−1/2∑gj(v(j))(Mm+1(j)−M(j)))+Op(n−1/2)

=∑j=1k(njn)1/2(nj−1/2Sn(j)+1nj∑(mnj)−1/2m(gj(Xm/m)−gj(v(j))Mm+1(j))

+nj−1∑(j)mnj−1/2gjvjM˙jmμm+1−μ)+Opn−1/2,

where M˙(j) is the derivative of M(j) with respect to μ and evaluated at μ, which is a k×k×k array.

As in Theorem 2 (ii), m(μm+1−μ)→DN(0,{σ2/v(j)}). With the form of M(j), by the Corollary in Yuan and Chai [8], for each m in the summation ∑(j), we have m(Xmm−v(j)))→DN(0,Ω(j)), so by the delta method,

m(gj(Xm/m)−gj(v(j)))→DN(0,gjΩ(j)g'j),

where gj(v)=∂gj(v)/∂v' and g˙j=g˙j(v(j)), and so

1nj∑(mnj)−1/2m(gj(Xm/m)−gj(v(j)))Mm+1(j)→DN(0,Cj2M(j)′gjΩ(j)g'jM(j)),

with Cj=∫(α1+⋯+αj−1)/αj(α1+⋯+αj)/αjw−1/2dw. Similarly, with Ωμ(j)=diag{σ12/v1(j),...,σj2/vj(j)},

nj−1∑(mnj)−1/2gj(v(j))M(j)m( μ m+1− μ )→DN(0,Cj2(gjM(j))′Ωμ(j)gjM(j)).

Recall {Sn,fn} is a martingale, and for each m in the summation ∑(j) sign, denote Hm=E(M( μ m)|Fm−1), then

Note in our case, limmHm=limmMm=M(j) (a.s.), diag{(Wm−1Xm−1)/|Wm−1Xm−1|} is bounded, so the first term above tends to zero. Thus,

1nj∑(j)Cov((gj(Xm/m)−g˙j(v(j))Mm+1(j),gj(v(j))M(j)(μm+1−μ)|fm)

=1nj∑M(j)′gj( μ m+1− μ )′( μ m+1− μ )gjM(j)+op(1)→P0,

so we get

n(Xnn−y)→DN(0,Ω¯11),Ω¯11=∑j=1kαj(Qj+Cj2(gjM(j))′Ω(j)gjM(j)+Cj2(gjM(j))′Ωμ(j)gjM(j)).

Similarly, we have

n(N(n)n−u)→DN(0,Ω¯22),Ω¯22=∑j=1kαj(diag(gj(v(j)))−Gj(v(j))+Cj2(gj)′Ω(j)gj,

and

nXnn−y,Nnn−u→DN0,Ωˉ,Ωˉ=Ωˉij1≤i,j≤2,

where Ω¯12=∑j=1kαjM(j)′(diag(gj(v(j)))−Gj(v(j))+Cj2(gjM(j))′Ω(j)).

Acknowledgment

We thank Dr. Ainong Zhou of MedStar, Inc. for the topic of staggered-start clinical trials. Yuan, Xiong and Tan are partially supported by the National Cancer Institute (NCI) grant R01CA164717 and P30CA 051008. Li is supported in part by the National Science Foundation of China, Grant No. (11371353, 61134013) and the Breakthrough Project of Strategic Priority Program of the Chinese Academy of Sciences, Grant No. XDB13040600.

References

1. Rosenberger WF, Lachin JM. The use of response-adaptive designs in clinical trials. Controlled Clin Trials 1993;14:471–84.10.1016/0197-2456(93)90028-CSearch in Google Scholar

2. Eisele JR. The doubly adaptive biased coin design for sequential clinical trials. J Stat Plann Inference 1994;38:249–62.10.1016/0378-3758(94)90038-8Search in Google Scholar

3. Melfi V, Page C. Variability in adaptive designs for estimation of success probabilities. In: Flournoy N, Rosenberger WF, Wong WK, editors. New developments and applications in experimental design. Hayward, CA: IMS, 1998:106–114.10.1214/lnms/1215456190Search in Google Scholar

4. Rosenberger WF, Stallard N, Ivanova A, Harper C, Ricks M. Optimal adaptive designs for binary response trials. Biometrics 2001;57:909–13.10.1111/j.0006-341X.2001.00909.xSearch in Google Scholar

5. Bai ZD, Hu F, Rosenberger WF. Asymptotic properties of adaptive designs for clinical trials with delayed response. Ann Stat 2002;30:1–18.10.1142/9789812793096_0017Search in Google Scholar

6. Bai ZD, Hu F. Asymptotics in randomized urn models. Ann Appl Probab 2005;1B:914–40.10.1142/9789812793096_0019Search in Google Scholar

7. Zhang L, Hu F, Cheung S. Asymptotic theorems of sequential estimation-adjusted urn models. Ann Appl Probab 2006;16:340–69.10.1214/105051605000000746Search in Google Scholar

8. Yuan A, Chai GX. Optimal adaptive generalized Pólya urn design for multi-arm clinical trials. J Multivariate Anal 2008;99:1–24.10.1016/j.jmva.2006.12.004Search in Google Scholar

9. Zhu H, Hu F. Sequential monitoring of response-adaptive randomized clinical trials. Ann Stat 2010;38:2218–41.10.1214/10-AOS796Search in Google Scholar

10. Yuan A, Bezandry P, Bonney G. Compound adaptive GPU design for clinical trials. J Stat Plann Inference 2010;140:3505–15.10.1016/j.jspi.2010.05.022Search in Google Scholar PubMed PubMed Central

11. Barthel F, Babiker A, Royston P, Parmar M. Evaluation of sample size and power for multi-arm survival trials allowing for non-proportional hazards, loss to follow-up and cross-over. http://www.fm-sbarthel.de/ISCB04.pdf, 2004.Search in Google Scholar

12. Bodick N, Forette F, Hadler D, Harvey RJ, Leber P, McKeith IG, et al. Protocols to demonstrate slowing of Alzheimer disease progression. Position paper from the International Working Group on Harmonization of Dementia Drug Guidelines. The Disease Progression Sub-Group. Alzheimer Dis Assoc Disord 1997;11:50–53.Search in Google Scholar

13. Leber P. Observations and suggestions on antidementia drug development. Alzheimer Dis Assoc Disord 1996;10:31–5.10.1097/00002093-199601031-00009Search in Google Scholar PubMed

14. Kublin J. Advancing HIV vaccines to efficacy testing in the HVTN and linking to other prevention research, HIV Vaccine Trials Network Washington, DC, 30 May 2012. http://hvtn.org/meeting/ppt/may12/P1/JimKublinMAY30FINAL.pdf., 2012.Search in Google Scholar

15. Cummings J, Gould H, Zhong K. Advances in designs for Alzheimer’s disease clinical trials. Am J Neurodegener Dis 2012;1:205–16.Search in Google Scholar

16. Hendrix S, Horton S, Orgogozo JM. Modification of the “randomized withdrawal” and “staggered start” clinical trial designs: toward a practical demonstration of disease modification in Alzheimers disease. https://www.myriad.com/downloads/EFNS-2007-Disease-Modification-in-AD-Hendrix.pdf, 2013.10.1016/j.jalz.2007.04.157Search in Google Scholar

17. Hayre LS. Two-population sequential tests with three hypotheses. Biometrika 1979;66:465–74.10.1093/biomet/66.3.465Search in Google Scholar

18. Jennison C, Turnbull BW. Group sequential methods with applications to clinical trials. Boca Raton, FL: Chapman and Hall/CRC Press, 2000.10.1201/9781584888581Search in Google Scholar

19. Eisele JR, Woodroofe M. Central limit theorems for doubly adaptive biased coin designs. Ann Stat 1995;23:234–54.10.1214/aos/1176324465Search in Google Scholar

20. Hu F, Zhang L, He X. Efficient randomized-adaptive designs. Ann Stat 2009;37:2345–560.10.1214/08-AOS655Search in Google Scholar

21. Athreya KB, Karlin S. Limit theorems for the split times of branching processes. J Math Mech 1967;17:257–77.10.1512/iumj.1968.17.17014Search in Google Scholar

22. Gouet R. Strong convergence of proportions in a multicolor Pólya urn. J Appl Probab 1997;34:426–35.10.2307/3215382Search in Google Scholar

23. Janson S. Functional limit theorems for multitype branching processes and generalized Pólya urns. Stochastic Processes Appl 2004;110:177–245.10.1016/j.spa.2003.12.002Search in Google Scholar

24. Hall P, Heyde CC. Martingale limit theory and its application. New York: Academic Press, 1980.Search in Google Scholar

Published Online: 2015-12-15

Published in Print: 2016-11-1

Articles in the same Issue

https://doi.org/10.1515/ijb-2015-0011

Keywords for this article

adaptive design; allocation proportion; optimality; sequential design; staggered-start