Structural Nested Mean Models to Estimate the Effects of Time-Varying Treatments on Clustered Outcomes

Jiwei He; Alisa Stephens-Shields; Marshall Joffe

doi:10.1515/ijb-2014-0055

Artikel Öffentlich zugänglich

Structural Nested Mean Models to Estimate the Effects of Time-Varying Treatments on Clustered Outcomes

Jiwei He , Alisa Stephens-Shields und Marshall Joffe

Veröffentlicht/Copyright: 26. Juni 2015

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift The International Journal of Biostatistics Band 11 Heft 2

Abstract

In assessing the efficacy of a time-varying treatment structural nested models (SNMs) are useful in dealing with confounding by variables affected by earlier treatments. These models often consider treatment allocation and repeated measures at the individual level. We extend SNMMs to clustered observations with time-varying confounding and treatments. We demonstrate how to formulate models with both cluster- and unit-level treatments and show how to derive semiparametric estimators of parameters in such models. For unit-level treatments, we consider interference, namely the effect of treatment on outcomes in other units of the same cluster. The properties of estimators are evaluated through simulations and compared with the conventional GEE regression method for clustered outcomes. To illustrate our method, we use data from the treatment arm of a glaucoma clinical trial to compare the effectiveness of two commonly used ocular hypertension medications.

Keywords: clustered observations; time-varying confounding; structural nested mean models

1 Introduction

Structural nested models (SNMs) are a class of models that is useful for estimating the causal effect of a time-varying treatment in the presence of time-varying confounding [1]. In some studies, treatments are affected by covariates at earlier time points and subsequently affect covariates at later time points. Confounding that arises in such situations cannot be properly controlled for by conventional regression methods. However, such problems can be solved by several other methods, including marginal structural models (MSMs) and SNMs. Compared to MSMs, SNMs have the advantage of being able to model effect modification by time-varying covariates, whereas MSMs can only model effect modification by baseline covariates. SNMs are also useful for modeling the effect of dynamic treatment regimes, in which the treatment plan changes according to observed patient characteristics during the treatment course, in addition to static treatment regimes, in which the entire treatment schedule is known at the time of initiation [2]. Another advantage of SNMs is that they can work under conditions where certain assumptions fail. For example, SNMs may be fit under instrumental variable assumptions when there is unmeasured confounding [3, 4]. There are two types of SNMs: (1) structural nested mean models (SNMMs) model the joint effect of treatments on the mean of the outcome conditional on the treatment and covariate history, and (2) structural nested distribution models (SNDMs) model the joint effect of treatments on the entire distribution of the outcome conditional on the treatment and covariate history. So far, SNMs have seldom been used for clustered observations. Brumback et al. [5] used SNMs and an instrumental variable approach to estimate the effect of cluster-level treatment on unit-level outcomes in the presence of unmeasured confounding [5]. This paper generalizes previous work by formulating SNMMs for clustered observations with time-varying confounding and treatments. The formulation applies to both unit-level and cluster-level treatments.

A motivating example is the study of eye disease treatment, where two eyes of a given subject form a cluster. Eye-specific outcomes from a given subject tend to be correlated, because they share subject-specific characteristics. Rubin [6] calls a situation where “the observation on one unit is unaffected by the particular treatment assignment to the other units” the “stable unit-treatment value assumption (SUTVA)” [6]. If we consider each eye as a separate unit for treatment, the SUTVA assumption may be violated. Under eye-specific treatment, treatment received by one eye may affect outcomes in the other eye of the same subject. If the treatments are binary, each eye will have not only two but four possible treatments, depending on the treatment received by both eyes. VanderWeele [7] has discussed the SUTVA in the context of neighborhood effects [7]. In the ophthalmology study, treatments are eye-specific drops. Treatments, clinical covariates, and outcomes are measured repeatedly over an extended period of time; treatments are therefore subject to time-dependent confounding. We are interested in modeling the joint effect of a series of treatments on a series of clustered outcomes. Other examples of clustered observations include repeated measurements of disease symptoms on the same individual or an infectious disease among people in the same household.

The structure of the paper is as the following: we first describe the setup for exploring causal effects among clustered observations. We then formulate SNMMs for clustered observations in a single time point setting and describe an efficient estimating equation. We then extend the model and estimation to time-varying treatments and repeatedly measured outcomes. Last, the method is evaluated using simulation studies and applied to an ophthalmology dataset.

2 Notation for clustered observations

We first describe the general structure of data with repeatedly measured treatments and repeatedly measured clustered outcomes and the definition of a causal effect in such a context. The notation is based on Robins [1] with extension for clustered data [1]. Suppose that data is collected on N clusters, i=1,2,...,N at K+1 time points, k=0,1,...,K. At each time point k, outcomes are measured from J units in each cluster, j=1,2,...,J. Let Aimj denote unit-specific treatment in the jth unit at time m for cluster i, and Aim† denote systemic treatment received at time m for cluster i. Let Limj denote unit-specific covariates in the jth unit, and Lim† denote cluster-specific covariates measured at time m for cluster i. Yimj denotes the observed outcome in jth unit measured at time m for cluster i. Yimj is considered as part of Limj. Aim={Aim1,...,AimJ,Aim†} is all treatments received at time m by cluster i. Lim={Lim1,...,LimJ,Lim†} is all covariates measured at time m in cluster i. Yim={Yim1,...,YimJ} is all observed outcomes measured at time m in cluster i. We use overbars to denote the history of a variable; thus, A¯im={Ai0,Ai1,...,Aim} is treatment history through time m in cluster i, and L¯im={Li0,Li1,...,Lim} is covariate history through time m in cluster i. We use underbars to denote the future of a variable; thus, Y_im={Yim,Yi,m+1,...,YiK} is outcomes observed since time m in cluster i. If unstated, the notation in formulation and figures is for one cluster and therefore omits the subscript i for cluster. The subscript m for time is omitted in examples with single time point. Unbold Aim,Lim,Yim represent scalar treatment, covariate and outcome, respectively for subject i at time m in situations without clustered outcomes, which we use to reference previous literature on SNMMs.

To define causal effects and structural models, we will use potential or counterfactual outcomes that are outcomes that would be observed were the subject to receive a certain treatment [8]. We presume that treatments at time m do not affect outcomes prior to time m. Potential outcomes are considered under a regime, which is defined as a sequential set of treatments a¯K={a1,...,aK}. In formulating SNMs, the regime (aˉm,0), the counterfactual treatment regime that agrees with aˉm through time m and is 0 thereafter, is of particular interest. Let Yika¯m,0={Yik1a¯m,0,...,YikJa¯m,0} denote the potential outcome in cluster i that would be observed at time k where k>m were the cluster to receive treatment history a¯m through time m and 0 thereafter. Let Y_i,m+1a¯m,0={Yi,m+1a¯m,0,...,YiKa¯m,0} denote the series of potential outcomes in cluster i that would be observed after time m. Causal effects can be defined as contrasts of potential outcomes for the same group of subjects under different treatment regimes. For instance, Yima¯K−Yima¯K', where a¯K≠a¯K', denotes the causal effect of treatment aˉK versus a¯K' on a subject’s outcome at time m.

3 Structural models and estimation for point treatments in clustered observations

Prior to describing the formulation of SNMMs for clustered observations, we review the general concepts of SNMMs for nonclustered outcomes. SNMMs model the contrast between conditional means of potential outcomes γm∗(lˉm,aˉm)=gEY_m+1aˉm,0|Lˉm=lˉm,Aˉm=aˉm−gEY_m+1aˉm−1,0|Lˉm=lˉm,Aˉm=aˉm, where g can be any monotone link function. Frequently used link functions are the identity g(x)=x, loglinear g(x)=log(x), or logistic g(x)=logit(x) [1, 9]. This contrast represents the sequential removal of the effect of a treatment am at time m on the means of subsequent potential outcomes after having removed the effect of all later treatments, a process called “blipping down” [1]. A function γm∗(lˉm,aˉm) that defines such a contrast is called a blip function. It can be parameterized with finite dimensional parameter ψ.

We first formulate SNMMs for clustered outcomes in a single time point example, where treatments, covariates and outcomes are measured at one time point only. In this example, we assume that each cluster consists of two units, and consider unit-level treatment only. The results can be easily generalized to situations with more than two units or with cluster-level treatment. A directed acyclic graph (DAG) is shown in Figure 1. The subscript here refers to unit number. L={L1,L2,L†} is the set of cluster- and unit-level measured confounder for the effect of joint treatment A={A1,A2} on clustered outcomes Y={Y1,Y2}. We assume that there is no unmeasured confounding between A and Y. Dashed double arrows represent possible unmeasured confounding within L, A or Y between units. U and dotted arrows represent possible unmeasured confounding between L and Y. Let Yja=Yjaj,aj′ denote the potential outcome in jth unit, where aj and aj′ refer to unit-specific treatment to the same unit and the other within-cluster unit respectively, {j,j′}={1,2},{2,1}.

Figure 1:

Causal diagram for an example with single time point.

3.1 Structural nested mean models

For binary treatments, we may consider the joint treatment {A1,A2} across units within a cluster as one combined treatment at the cluster level with 4 levels, namely {0,0}, {1,0}, {0,1}, {1,1}. Then a SNMM with joint treatment can be formulated in the same way as one with a single treatment as a contrast between means of potential outcomes under joint treatment versus means of potential outcomes under no treatment as seen in eq. (1), where g(.) is a monotone link function. If there is an additional binary systemic treatment A†, the combined treatment {A†,A1,A2} will contain eight levels. In the rest of the paper, we focus on the SNMM with identity link such that g(x)=x. The method introduced, however, can be applied to other link functions, which will be discussed in the discussion section:

(1)gEY1a1,a2Y2a2,a1| L=l,A=a−gEY10,0Y20,0| L=l,A=a=γ1∗(a1,a2,l)γ2∗(a2,a1,l).

The vector-valued blip function γ∗(.) models the effect of removing joint treatment a from both unit-level outcomes within the cluster. It can be parameterized with the finite dimensional parameter ψ to form

(2)γ1∗(a1,a2,l;ψ)γ2∗(a2,a1,l;ψ).

As an example, let

(3)γ1∗(a1,a2,l;ψ)γ2∗(a2,a1,l;ψ)=ψ1a1+ψ2a2+ψ3a1a2ψ1′a2+ψ2′a1+ψ3′a2a1.

In eq. (3), ψ∈R6; ψ1 and ψ1′ represent the effect of unit-specific treatment on the outcome in the same unit. ψ2 and ψ2′ represent the effect of treatment on the outcome in the opposite unit, ψ3 and ψ3′ represent the interaction between treatments in different units. We can also interpret ψ1 and ψ1′ as the direct effect of treatment on the same unit. ψ2 represents the spillover effect of unit 2’s treatment on unit 1 when a1=0, and ψ2+ψ3 represents the spillover effect of unit 2’s treatment on unit 1 when a1=1. Similarly, ψ2′ represents the spillover effect of unit 1’s treatment on unit 2 when a2=0, and ψ2′+ψ3′ represents the spillover effect of unit 1’s treatment on unit 2 when a2=1. This example imposes the assumption that effect modification by covariates is absent.

The dimension of causal parameters will increase sharply with increase in the number of units per cluster. Some additional model assumptions are considered to reduce the dimension of causal parameters. For the ophthalmology example, it is reasonable to assume symmetry between two eyes, namely the effect of the joint treatment is the same for each unit. In eq. (3), the symmetry assumption implies ψ1=ψ1′,ψ2=ψ2′,ψ3=ψ3′. In some cases, we may want to assume no interference, namely unit-specific treatment on one unit has no effect on outcomes in the other units of the same cluster. With two units per cluster, it refers to

Yjaj,aj′=Yjaj,0.

In eq. (3), the no interference assumption implies ψ2=ψ3=ψ2′=ψ3′=0.

3.2 Identifiability

A1–A4 below are sufficient assumptions to identify the causal effect of joint treatment a relative to no treatment from observed data and therefore the causal parameter ψ.

Consistency. If A=a for a given cluster, then Ya=Y for that cluster.
Conditional exchangeability. Y0,0⊥ A | L=l.
Positivity. If fL l > 0, then fA|L(a|l)>0 for all treatments a.
Cluster-level SUTVA. The observation on one cluster is unaffected by the particular treatment assignment to the other clusters.

In the presence of interference between units, potential outcomes like Yjaj are not well-defined. However, under no interference, it is reasonable to relax A1 and define consistency under the standard assumption for non-clustered data: if Aj=aj for a given unit j, then Yjaj=Yj for that unit. A2 can be viewed as no unmeasured confounding. It states the independence between treatment-free potential outcomes and the joint treatment to a cluster conditional on the covariates within the cluster. A3 can be relaxed in SNMMs such that the assumption does not need to hold for all L with fL l>0. A4 is similar to the partial interference assumption described in Sobel [10]. Note that our method allows for interference between units within the same cluster, but the SUTVA assumption needs to hold for clusters.

3.3 Alternative modeling

An alternative way of modeling the joint effect of {a1,a2} is to use the concepts in SNMMs with time-varying treatments and sequentially remove the effect of each unit-specific treatment. In the following example, the effect of treatment on the opposite unit is removed first, followed by the effect of treatment on the same unit.

Remove the effect of treatment on the opposite unit aj′:
(4a)EYjaj,aj′|L=l,A=a−EYjaj,0|L=l,A=a=γ1,j∗(aj′,l,aj)
Remove the effect of treatment on the same unit aj:
(4b)EYjaj,0|L=l,Aj=aj−EYj0,0|L=l,Aj=aj=γ2,j∗(aj,l)

For example, the blip functions in eqs (4a) and (4b) can be parameterized as γ1,j∗(aj′,l,aj;ψ)=ψ2aj′+ψ3aj′aj and γ2,j∗(aj,l;ψ)=ψ1aj, respectively.

Assumptions A1 and A3 plus a sequential ignorability assumption are sufficient to identify the causal effect in eqs (4a) and (4b). Specifically, the sequential ignorability assumption states that Yjaj,0⊥Aj′|L,Aj and Yj0,0⊥Aj|L. Testing for no interference under this model is equivalent to testing whether the blip function in eq. (4a) equals 0. In this case, since aj and aj′ are simultaneous treatments, the order of blipping down is arbitrary. Parameters from different orders of blipping down have different interpretations. For this reason, the approach in eq. (1) is used for the rest of the paper. However, the alternative approach may be useful in other settings where the order of blipping down is meaningful. For instance, if clusters are defined by families, it may be reasonable to blip down the treatment on parents first and then that on children or vice versa.

3.4 Estimation

Robins [1] has derived a class of estimating equations for parameters of SNMMs based on semiparametric theory [1]. He defined Hk(ψ)=Yk−∑l=0k−1γl,k∗Lˉl,Aˉl;ψ as the fully blipped down outcome at time k, and H(m)=[Hm+1(ψ),Hm+2(ψ),....,HK(ψ)]T as the vector of blipped down outcomes after time m, where m=0,...,K−1. Under the sequential ignorability assumption, EH(m)|Lˉm,Aˉm=EH(m)|Lˉm,Aˉm−1. H˙(m) is defined as H˙(m)=H(m)−EH(m)|Lˉm,Aˉm−1. The efficient estimating function, known as the efficient score [11], under general settings involves complex recursive expressions, but under the assumption that

(5)EHkHj|Lˉm,Aˉm=EHkHj|Lˉm,Aˉm−1,

for k>m,j>m,m=0,...,K−1, the efficient score can be simplified as:

(6)Sψeff=∑i=1N∑m=0K−1E∂H˙i(m)∂ψT|Lˉim,Aˉim−E∂H˙i(m)∂ψT|Lˉim,Aˉi,m−1⋅VarH˙i(m)|Lˉim,Aˉi,m−1−1⋅H˙i(m).

By setting eq. (6) to 0 and solving the estimating equation, one can obtain an efficient estimator for the causal parameter ψ. If the assumption in eq. (5) does not hold, the estimator from eq. (6) will not achieve semiparametric efficiency but will nonetheless be consistent. Equivalently, we can express eq. (6) in terms of the partially blipped down outcome Um with removal of the effect of treatments from time m onwards [4]. Specifically, Um is a vector with elements Um,k=Yk−∑l=mk−1γl,k∗Lˉl,Aˉl, for k=m+1,...,K.Um mimics the potential outcomes in the sense that E(Um|L¯m,A¯m)=E(Y_m+1a¯m−1,0|L¯m,A¯m). The alternative expression is

Sψeff=∑i=1N∑m=0K−1E∂Uim∂ψT|Lˉim,Aˉim−E∂Uim∂ψT|Lˉim,Aˉi,m−1. VarUim|Lˉim,Aˉi,m−1)−1⋅

(7)Uim−EUim|Lˉim,Aˉi,m−1.

The theory for SNMMs can be directly applied to situations with clustered observations. We first construct blipped down outcomes U. For a single time point example with bivariate outcomes, they can be written as

U=Y1Y2−γ1∗(A1,A2,L;ψ)γ2∗(A2,A1,L;ψ).

The following is a parameterized example for blip function under the symmetry assumption:

(8)γ1∗(a1,a2,l;ψ)γ2∗(a2,a1,l;ψ)=ψ1a1+ψ2a2+ψ3a1a2ψ1a2+ψ2a1+ψ3a2a1.

Based on eq. (7), the efficient score for a single time point example can be written as

(9)Sψeff=∑i=1N∂Ui∂ψT−E∂Ui∂ψT|Li⋅VarUi|Li−1⋅Ui−EUi|Li.

For the example in eq. (8), ∂U∂ψT has the following expression:

∂U∂ψT=A1A2A2A1A1A2A2A1.

E∂U∂ψT|L is based on a model for treatment. If treatment probabilities are unknown, a multinomial logistic regression model can be used to estimate the treatment probability in each category of the joint treatment. The estimating equation in eq. (9) is doubly robust, because the estimator for ψ will be consistent if either the model for E∂U∂ψT|L or the model for EU|L is correctly specified. When both are correctly specified and the conditional variance VarU|L is also correctly specified, the estimator achieves the semiparametric efficiency bound for model (2) if assumption (5) is true. However, correct specification of VarU|L is not required for the consistency of the estimator.

4 Structural models and estimation for time-varying treatments in clustered observations

We next extend the method to time-varying treatments. Suppose that treatments and covariates are measured repeatedly at times m=0,...,K−1, and outcomes are measured only at the last time point K. Figure 2 is a causal diagram that shows the effect of a joint treatment from two arbitrary time points m−1 and m on the clustered outcomes at time m+1. The cluster-specific covariate Lm† is omitted from the graph for simplicity. In this example, Lm={Lm,1,Lm,2} is a time-varying confounder for the effect of Am={Am,1,Am,2} on Y={Y1,Y2}. Therefore, they need to be controlled for in analysis. Lm, however, also mediates the effect of earlier treatments Am−1 on Y, which suggests that analysis should not control for Lm. In addition, Lm is a collider on the pathway from Am−1 to Y. Controlling for Lm will thus induce bias in estimating the causal effect. For these reasons, conventional regression methods will fail in the presence of time-varying confounding [12].

Figure 2:

Causal diagram for an example with time-varying treatments.

4.1 Structural nested mean models

In the SNMM below, the blip function vector γm∗(.) models the effect of removing joint treatment am at time m from both outcomes, after having removed the effect of all subsequent treatments.

(10)EY1aˉm,0Y2aˉm,0|Lˉm=lˉm,Aˉm=aˉm−EY1aˉm−1,0Y2aˉm−1,0|Lˉm=lˉm,Aˉm=aˉm=γm,1∗(aˉm,lˉm;ψ)γm,2∗(aˉm,lˉm;ψ).

For repeatedly measured outcomes, eq. (10) can be further generalized to

(11)EY_m+1aˉm,0|Lˉm=lˉm,Aˉm=aˉm−EY_m+1aˉm−1,0|Lˉm=lˉm,Aˉm=aˉm=γ_m∗(aˉm,lˉm;ψ),

where γ_m∗ is a vector with components γm,k∗(aˉm,lˉm;ψ) that models the effect of removing joint treatment at time m from both outcomes at time k, for k=m+1,...,K. A parameterized example for eq. (10) is shown below:

(12)γm,1∗(aˉm,lˉm;ψ)γm,2∗(aˉm,lˉm;ψ)=ψm,1am,1+ψm,2am,2+ψm,3am,1am,2ψm,1′am,2+ψm,2′am,1+ψm,3′am,2am,1.

The ψ parameters in eq. (12) have similar interpretation as those in example (3), except that they contain subscript m to indicate the effect of joint treatment at time m is time-specific. A more parsimonious example is shown below:

(13)γm,1∗(aˉm,lˉm;ψ)γm,2∗(aˉm,lˉm;ψ)=ψ1am,1+ψ2am,2+ψ3am,1am,2ψ1am,2+ψ2am,1+ψ3am,2am,1.

Model (13) makes the symmetry assumption and also assumes the effect of treatments is the same for each time point. Notice that subscript m is omitted for ψ. The ψ parameters can be interpreted as effect of treatment per unit time. For example, ψ1 represents the effect of unit-specific treatment on the outcome in the same unit per unit time.

The sequential ignorability assumption is required to identify the contrast in eqs (10) and (11). It is a generalization of assumption A2 to a series of time-varying treatments as shown below:

Y_m+1aˉm−1,0⊥Am|Aˉm−1=aˉm−1,Lˉm=lˉm.

It states that there is no unmeasured confounding between the partially blipped down potential outcomes and the joint treatment to a cluster at time m, conditional on the covariate and treatment history within the cluster.

4.2 Estimation

Similar to the single time point case, we first construct a series of partially blipped down outcomes Um for each time m:

Um=Y1Y2−∑l=mK−1γl,1∗(aˉl,lˉl;ψ)γl,2∗(aˉl,lˉl;ψ).

The partially blipped down outcomes mimic the potential outcomes Yaˉm−1,0 in the sense that EUm|Lˉm,Aˉm=EYaˉm−1,0|Lˉm,Aˉm. For repeatedly measured outcomes, Um is a vector with components

Um,k=Yk,1Yk,2−∑l=mk−1γl,k,1∗(aˉl,lˉl;ψ)γl,k,2∗(aˉl,lˉl;ψ),

for k=m+1,...,K.

For time-varying treatments, we suggest using the following score:

(14)Sψ=∑i=1N∑m=0K−1∂γm∗Lˉim,Aˉim;ψ∂ψT−E∂γm∗Lˉim,Aˉim;ψ∂ψT|Lˉim,Aˉi,m−1⋅VarUim|Lˉim,Aˉi,m−1−1⋅ Uim−EUim|Lˉim,Aˉi,m−1.

The efficient score in eq. (6) is usually too complicated to be used for situations with time-varying treatments. The estimator from the score in eq. (14) is not optimal but is reasonably efficient and doubly robust. It is straightforward to show that the estimator is consistent, if for each time m, either the model for treatment, E∂γm∗Lˉm,Aˉm;ψ∂ψT|Lˉm,Aˉm−1, or the model for outcome, EUm|Lˉm,Aˉm−1, is correctly specified. A simpler non-doubly robust estimating equation is

(15)Sψ=∑i=1N∑m=0K−1∂γm∗Lˉim,Aˉim;ψ∂ψT−E∂γm∗Lˉim,Aˉim;ψ∂ψT|Lˉim,Aˉi,m−1⋅Uim.

It does not involve modeling EUm|Lˉm,Aˉm−1 and assumes it to be 0. Therefore, the estimator from eq. (15) is consistent if and only if the model for treatment is correct.

For the parameterized example in eq. (13), we have the following expression for ∂γm∗Lˉm,Aˉm;ψ∂ψT:

∂γm∗Lˉm,Aˉm;ψ∂ψT=Am,1Am,2Am,2Am,1Am,1Am,2Am,2Am,1.

Similarly, E∂γm∗Lˉm,Aˉm;ψ∂ψT|Lˉm,Aˉm−1 is based on a model for treatment. If treatment probabilities are unknown, a multinomial logistic regression model can be used to estimate treatment probability in each category of the combined treatment {Am,1,Am,2} received at time m. We can either fit a separate model for each time m, or fit a single model to pooled data from all person times. For the pooled treatment model, we may allow intercepts to be time-varying with the use of a spline.

4.3 Closed form estimator

If VarUm|Lˉm,Aˉm−1 is given, and a linear working model Dmδ=dmLˉm,Aˉm−1δ is assumed for EUm|Lˉm,Aˉm−1, where δ is the q−dimensional nuisance parameter in the outcome model and dm is a J×q matrix-valued function of Lˉm,Aˉm−1, and the blip function is linear in ψ with γm∗Lˉm,Aˉm=Rmψ=rmLˉm,Aˉm, where ψ is p−dimensional and rm is a J×p matrix-valued function of Lˉm,Aˉm, the estimator for ψ will have the following closed form (based on Robins and Hernán [12]):

(16)ψˆδˆ=∑i=1N∑m=0K−1CimDimT⋅VarUim|Lˉim,Aˉi,m−1−1⋅∑l=mK−1RilDim−1⋅∑i=1N∑m=0K−1CimDimT⋅VarUim|Lˉim,Aˉi,m−1−1⋅Yi,

where Cm=∑l=mK−1Rl−ERl|Lˉm,Aˉm−1.

A simple working model for VarUm|Lˉm,Aˉm−1 is σ2IJ×J for some constant variance σ2 at all times, where IJ×J is an identity matrix of dimension J. However, the independence structure and constant variance over time are often far from truth. A simulation study described in the next section will examine the impact of such misspecification on the efficiency of ψˆ. If one assumes a working model other than independence structure and constant variance, estimation will then require iterative procedures similar to those in generalized estimating equations (GEEs). We can start by assuming working model σ2IJ × J to obtain an initial estimate for ψ, and then iterate between estimating parameters in VarUm|Lˉm,Aˉm−1 and estimating ψ until convergence is achieved. We adapted the method of moments approach by Liang and Zeger [13] in estimating variance parameters [13]. An asymptotic variance estimator for ψˆ may be derived using the sandwich method, with the exact form found in Appendix A. Like GEE, VarUm|Lˉm,Aˉm−1 does not need to be correctly specified for the estimator for ψ to be consistent.

5 Simulation study

5.1 Algorithm

The data generating process is based on the likelihood for SNMMs described in Robins et al. [9]. Details about the likelihood can be found in Appendix B. A simulation study is conducted based on the setting where treatments and covariates are measured repeatedly at times m=0,...,K−1, and outcomes are measured at the last time point K only. The number of time points is K=5, and the cluster size is J=2. The total number of clusters is varied as N=100 or 1,000. The algorithm for generating data for one cluster is shown below. It includes two examples with different covariate effects in the conditional mean of outcomes. Variables with subscript m=−1 assume value 0.

For m=0,...,K−1, we generated covariates Lm from a multivariate normal distribution:

Lm,1Lm,2Lm†|Lˉm−1,Aˉm−1∼N30.3Lm−1,1+0.2Am−1,10.3Lm−1,2+0.2Am−1,20.3Lm−1+0.1Am−1,1+0.1Am−1,2,10.30.20.310.20.20.21.

We generated joint treatment Am from a multinomial logistic regression model with probabilities pm,00, pm,10, pm,01, pm,11:

Am,1Am,2|Lˉm,Aˉm−1∼multinomial(pm,00,pm,10,pm,01,pm,11),

where

pm,10=mlogit−1(0.5Lm,1−0.5Lm†+0.2Am−1,1),

pm,01=mlogit−1(0.5Lm,2−0.5Lm†+0.2Am−1,2),

pm,11=mlogit−1(0.5Lm,1+0.5Lm,2−L1†+0.2Am−1,1+0.2Am−1,2),

pm,00=1−pm,01−pm,10−pm,11.

Last, observed outcomes YK were generated. We first generated the conditional mean of YK under the sequential ignorability assumption:

E(YK|lˉK−1,aˉK−1)=∑m=0K−1γm∗(lˉm,aˉm)+∑m=1K−1E(YKam−1,0|lˉm,aˉm−1)−E(YKam−1,0|lˉm−1,aˉm−1)+E(YK0|l0).

For both simulation examples,

∑m=0K−1γm∗(lˉm,aˉm)=∑m=0K−1ψ1am,1+ψ2am,2+ψ3am,1am,2+ψ4am,1lm,1+ψ5am,1lm†ψ1am,2+ψ2am,1+ψ3am,2am,1+ψ4am,2lm,2+ψ5am,2lm†,

where ψ1=2,ψ2=0.5,ψ3=0.2,ψ4=0.5,ψ5=−0.2, and

E(YK0|l0)=2l0,1−l0†2l0,2−l0†.

The two examples differ in the association between covariates and the potential outcomes. In Example 1, the association is linear:

(17)∑m=1K−1[E(YKam−1,0|l¯m,a¯m−1)−E(YKam−1,0|l¯m−1,a¯m−1)]=∑m=1K−1{(lm,1−lm†lm,2−lm†)−E[(lm,1−lm†lm,2−lm†)|l¯m−1,a¯m−1]}.

In Example 2, the association is quadratic:

(18)∑m=1K−1[E(YKam−1,0|l¯m,a¯m−1)−E(YKam−1,0|l¯m−1,a¯m−1)]=∑m=1K−1{((lm,1−lm†)2(lm,2−lm†)2)−E[((lm,1−lm†)2(lm,2−lm†)2)|l¯m−1,a¯m−1]}.

YK was generated from its conditional mean plus a bivariate normal error term:

YK=E(YK|lˉK−1,aˉK−1)+ε,

where the error term also follows multivariate normal distribution:

ε|LˉK−1,AˉK−1∼N20,10.20.21.

In addition, we explored the impact of varying the correlation between unit-specific treatments in the same cluster on the estimates. The simulation procedure is the same as Example 1 with the number of clusters as N=1,000, except that simpler SNMMs with fewer parameters are used, as shown in eqs (19) and (20), and the correlation between unit-specific treatments in the same cluster is varied between 0,0.9 and –0.9. We tested one example without interference and one with interference respectively. In the example without interference, the SNMM contains the effect of treatment in the same unit and the effect modification by unit-specific covariate in the same unit:

(19)γm∗(lˉm,aˉm)=ψ1am,1+ψ2am,1lm,1ψ1am,2+ψ2am,2lm,2,

where ψ1=2,ψ2=0.5. In the example with interference, the SNMM contains the effect of treatment in the same unit, the effect of treatment in the opposite unit, and the interaction between treatments:

(20)γm∗(lˉm,aˉm)=ψ1am,1+ψ2am,2+ψ3am,1am,2ψ1am,2+ψ2am,1+ψ3am,2am,1,

where ψ1=2,ψ2=0.5,ψ3=0.2.

Last, we explored the impact of a misspecified SNMM on the estimates. For this simulation, we used the SNMM with interference in eq. (20) as the true model, and fit a misspecified SNMM containing the effect of treatment in the same unit only.

5.2 Results

Two doubly robust (DR) estimators eq. (14) and one without the doubly robust property eq. (15) were computed using the simulated data (Example 1). For both DR estimators, we assume a linear working model for EUm|Lˉm,Aˉm−1. For VarUm|Lˉm,Aˉm−1, we either assume an exchangeable working model Σm=σm21ρmρm1 for each time m and denote the estimator as DR1, or a working model with independence structure and constant variance Σ=σ21001 for all time points and denote the estimator as DR2. The DR2 estimator has the closed form described in eq. (16), but not the DR1 estimator. The non-DR estimator is based on the score in eq. (15).

To estimate treatment probabilities, a multinomial logistic regression model is fit to pooled data from all time points. Simulation and estimation procedures were repeated for 1,000 replicates. The results are shown in Table 1. Averaged estimates (Ave EST), empirical standard errors (Emp SE), and averaged asymptotic standard errors (Ave SE) are provided. The square root of mean squared error (MSE) is computed from Ave EST and Emp SE. All three estimators have negligible bias. Coverage of all 95% confidence intervals (CI) at N=1000 show the expected range of 93%−96%. At N=100, all 95% CIs show under-coverage, likely due to the small sample size. The DR1 estimator has the worst coverage probabilities at N=100, but it is the most efficient among the three estimators. The relative MSE of DR2 and non-DR estimators relative to DR1 estimator ranges from 1.3 to 1.4 and 2.4 to 3.4, respectively. As seen in Table 2, for the DR1 estimator, the estimates of the variance parameter σm2 and correlation parameter ρm in the conditional variance of Um decrease over time, since more of the variance is accounted for by the increased treatment and covariate history in the conditioning. Therefore, a flexible working model for VarUm|Lˉm,Aˉm−1 used in DR1 estimator tends to improve efficiency.

Table 1:

Small and large sample performance of g-estimators in a simulation study based on Example 1.

N	Param	True	DR1 estimator					DR2 estimator					Non-DR estimator
			AveEST	EmpSE	AveSE	95% COV	MSE	AveEST	EmpSE	AveSE	95% COV	RelMSE	AveEST	EmpSE	AveSE	95% COV	RelMSE
100	ψ1	2.0	2.00	0.18	0.13	0.83	0.18	1.98	0.24	0.19	0.88	1.36	1.99	0.45	0.39	0.91	2.53
	ψ2	0.5	0.50	0.17	0.13	0.86	0.17	0.48	0.24	0.19	0.88	1.36	0.50	0.44	0.38	0.92	2.52
	ψ3	0.2	0.20	0.26	0.19	0.84	0.26	0.21	0.36	0.28	0.87	1.37	0.18	0.66	0.58	0.92	2.53
	ψ4	0.5	0.50	0.11	0.08	0.84	0.11	0.49	0.16	0.12	0.86	1.41	0.48	0.35	0.31	0.93	3.06
	ψ5	–0.2	–0.20	0.11	0.08	0.85	0.11	–0.18	0.16	0.12	0.88	1.41	–0.17	0.36	0.31	0.90	3.17
1,000	ψ1	2.0	2.00	0.05	0.05	0.95	0.05	2.00	0.07	0.07	0.95	1.35	2.00	0.14	0.13	0.95	2.64
	ψ2	0.5	0.50	0.05	0.05	0.95	0.05	0.50	0.07	0.07	0.95	1.32	0.50	0.13	0.13	0.95	2.64
	ψ3	0.2	0.20	0.07	0.07	0.95	0.07	0.20	0.10	0.10	0.96	1.34	0.20	0.20	0.20	0.94	2.74
	ψ4	0.5	0.50	0.03	0.03	0.94	0.03	0.50	0.04	0.04	0.93	1.32	0.50	0.11	0.11	0.94	3.34
	ψ5	–0.2	–0.20	0.03	0.03	0.95	0.03	–0.20	0.05	0.04	0.94	1.41	–0.20	0.11	0.11	0.94	3.38

^[1]

Table 2:

Estimates for variance parameter σm2 and correlation parameter ρm at each time point for DR1 estimator from Example 1.

Time (m)	σ^m2		ρ^m
	AveEST	EmpSE	AveEST	EmpSE
0	7.76	0.38	0.51	0.03
1	6.06	0.29	0.50	0.03
2	4.37	0.19	0.47	0.03
3	2.69	0.11	0.42	0.03
4	1.03	0.03	0.19	0.03

^[2]

We varied the correlation between unit-specific treatments in the same cluster given predictors in the treatment model and explored its impact on the estimates. Table 3(A) shows the results from the example with no interference. A large positive correlation of 0.9 between unit-specific treatments increased the standard errors of the estimates slightly. The relative MSE compared to no correlation ranges from 1.06 to 1.27. In contrast, a large negative correlation of –0.9 decreased the standard errors of the estimates slightly. The relative MSE compared to no correlation ranges from 0.67 to 0.87. Table 3(B) shows the results from the example with interference. In this case, both a large positive correlation of 0.9 and a large negative correlation of –0.9 between unit-specific treatments greatly increased the standard errors of the estimates. The relative MSE compared to no correlation ranges from 2.00 to 2.65. Therefore, varying the correlation between unit-specific treatments has a greater impact on the efficiency of the estimators in settings with interference. In such cases, when the unit-specific treatments in the same cluster are highly correlated either negatively or positively, the standard errors of the estimates will be inflated because of the multi-collinearity problem. This will affect the power of the study.

Table 3:

Examine the impact of the correlation between unit-specific treatments in the same cluster sample size N = 1,000.

			ρA=0			ρA≈0.9			ρA≈−0.9
	Param	True	Ave EST	Emp SE	MSE	Ave EST	Emp SE	Rel MSE	Ave EST	Emp SE	Rel MSE
A: An example with no interference
DR1	ψ1	2.0	2.00	0.03	0.03	2.00	0.04	1.27	2.00	0.03	0.83
DR1	ψ2	0.5	0.50	0.03	0.03	0.50	0.03	1.06	0.50	0.03	0.85
DR2	ψ1	2.0	2.00	0.04	0.04	2.00	0.05	1.21	2.00	0.03	0.67
DR2	ψ2	0.5	0.50	0.04	0.04	0.50	0.05	1.10	0.50	0.04	0.86
Non-DR	ψ1	2.0	2.00	0.08	0.08	2.00	0.10	1.26	2.00	0.06	0.82
	ψ2	0.5	0.50	0.10	0.10	0.50	0.11	1.09	0.50	0.09	0.87
B: An example with interference
DR1	ψ1	2.0	2.00	0.05	0.05	2.00	0.12	2.40	2.00	0.13	2.55
	ψ2	0.5	0.50	0.05	0.05	0.50	0.12	2.34	0.50	0.13	2.56
	ψ3	0.2	0.20	0.07	0.07	0.20	0.19	2.60	0.20	0.17	2.34
DR2	ψ1	2.0	2.00	0.07	0.07	2.00	0.15	2.16	2.00	0.16	2.44
	ψ2	0.5	0.50	0.07	0.07	0.50	0.15	2.23	0.51	0.16	2.53
	ψ3	0.2	0.20	0.10	0.10	0.21	0.25	2.52	0.19	0.22	2.24
Non-DR	ψ1	2.0	2.00	0.12	0.12	2.00	0.27	2.30	2.02	0.26	2.19
	ψ2	0.5	0.50	0.12	0.12	0.50	0.27	2.22	0.52	0.26	2.18
	ψ3	0.2	0.20	0.19	0.19	0.20	0.49	2.65	0.18	0.37	2.00

^[3]

For the example with interference in Table 3(B), we also fit a misspecified SNMM with no interference and examined its impact on the estimates. The results are shown in Table 4. The estimates from fitting a misspecified SNMM are biased. In this case, the estimates from the DR1 approach that assumes non-independence working correlation have larger bias than the estimates from the DR2 approach that assume independence working correlation and the non-DR estimates. Therefore, it is not recommended to use the DR1 approach when one speculates there is interference within the cluster but cannot fit a SNMM with interference for practical reasons. The MSE of these estimates are also larger than those in Table 3(B) because of bias.

Table 4:

Examine the impact of a misspecified SNMM on the estimates.

	Param	True	Ave EST	Emp SE	Ave SE	95% COV	MSE
DR1	ψ1	2.00	1.85	0.03	0.03	0.01	0.16
DR2	ψ1	2.00	2.09	0.04	0.04	0.41	0.10
Non-DR	ψ1	2.00	2.09	0.07	0.07	0.73	0.12

^[4]

We did a comparison between g-estimation of SNMMs and standard GEE regression, a conventional method to deal with clustered outcomes. We considered two simulation examples with different covariate effects as shown in eqs (17) and (18). Both SNMM and GEE regression approaches were applied to the simulated data. Predictors in restricted mean models for GEE include all factors in the blip functions as well as all covariates used in generating the conditional mean of outcomes. An exchangeable working model is assumed for the correlation structure. The conditional mean models used in the GEE approach in the two examples are

Yj∼∑m=0K−1Amj+∑m=0K−1Amj′+∑m=0K−1AmjAmj′+∑m=0K−1AmjLmj+∑m=0K−1AmjLm+∑m=0K−1Lmj+∑m=0K−1Lm,

and

Yj ∼ ∑m=0K−1Amj+∑m=0K−1Amj′∑m=0K−1AmjAmj′+∑m=0K−1AmjLmj+∑m=0K−1AmjLm+

L0j+L0+∑m=1K−1Lmj2+∑m=1K−1Lm2+∑m=1K−1Lmj⋅Lm

respectively. The models included correctly specified predictors, but did not properly control for time-varying confounding. The coefficients corresponding to treatment and effect modification are shown in Table 5. As expected, the DR1 estimators from SNMMs and g-estimation are unbiased, whereas bias is induced in GEE estimators by controlling for covariates affected by earlier treatments. In the first example, the GEE estimators are in fact more efficient than the DR1 estimators. However, the first two GEE estimators are biased. In the second example, the GEE estimators have larger standard errors compared to the DR1 estimators, and all of them are biased.

Table 5:

Compare the estimators obtained from SNMM and g-estimation to those from GEE linear regression using two simulation examples.

Example	Param	True	DR1 estimator		GEE estimator
			Bias	EmpSE	Bias	EmpSE	RELMSE
1	ψ1	2.0	–0.00	0.05	–0.08	0.03	1.69
	ψ2	0.5	–0.00	0.05	0.08	0.03	1.71
	ψ3	0.2	0.00	0.07	0.00	0.04	0.61
	ψ4	0.5	0.00	0.03	0.00	0.02	0.56
	ψ5	–0.2	–0.00	0.03	0.00	0.02	0.62
2	ψ1	2.0	–0.00	0.06	–0.17	0.10	3.07
	ψ2	0.5	–0.00	0.07	–0.02	0.09	1.46
	ψ3	0.2	0.00	0.09	0.07	0.13	1.58
	ψ4	0.5	0.00	0.04	0.14	0.06	3.56
	ψ5	–0.2	0.00	0.04	–0.31	0.06	7.25

^[5]

6 Analysis of glaucoma data

The Ocular Hypertension Treatment Study (OHTS) is a multi-center randomized trial designed to evaluate the safety and efficacy of topical ocular hypertensive medication in delaying or preventing primary open-angle glaucoma in subjects with elevated intra-ocular pressure (IOP) [14]. In the first part of the study, a total of 1,636 subjects with an IOP between 24 and 32 mm in one eye and between 21 and 32 mm in the other eye were randomized to either observation or treatment with commercially available topical ocular hypertension medication. In the second part of the study, both the observation arm and the treatment arm received medication. The subjects were followed up every 6 months up to 7.5 years median time in the first part of the study and up to 13 years median time in the second part.

We use data from 817 subjects in the treatment arm to illustrate our method. In the treatment arm, the subjects received several classes of topical ocular hypertension medications, including beta blockers, prostaglandin analogues, topical carbonic anhydrase inhibitors and alpha 2 agonists. Sometimes more than one type of medication was prescribed for one eye. We are interested in comparing the effectiveness of the two most common classes, beta blockers and prostaglandin analogues in lowering IOP. The treatment is eye-specific drops, and the outcome is IOP in each eye. Within the treatment arm, different medications were not randomized. At each visit, clinicians made decisions about the type and dose of medication given to each subject, although dose information is not available in the data provided. Treatment decisions at each time may be influenced by the subject’s treatment and covariate history. Therefore, the treatment received by these subjects is time-varying. The subject’s IOPs at each visit are affected by earlier treatment and IOPs, and in turn affect later treatment and IOPs. Therefore, they are likely to be time-varying confounders.

We created a categorical treatment variable containing three categories: beta blocker only (BB), prostaglandin analogue only (PROST) and beta blocker plus prostaglandin analogue (BB + PROST). Since almost all subjects received medication only to one eye at baseline, we did not model baseline treatment but still included relevant baseline covariates in our analysis. In the follow-up visits, once a subject received no medication or medications other than a beta blocker and prostaglandin analogue in one of his eyes, the subsequent person-times were excluded from analysis. Since all remaining subjects received a beta blocker at the first follow-up visit, treatment at the first follow-up visit was also not modeled. In this study, most subjects did not receive different treatments in each eye. Therefore, we decide not to model the effect of treatment on the opposite eye. For each eye, we pooled all subject-visits and fit a pooled multinomial logistic model after performing model selection based on AIC. The resulting treatment model was

mlogitPr[Atj=a|Aˉt−1,Lˉt]=α0t+α1At−1,j+α2At−1,j′+α3IOPt,j+α4IOPt,j′+α5IOPt−1,j+

α6At−1,j⋅IOPt−1,j+α7RACE+α8AGE+α9IOP0,j,

where a=1 or 2 refers to “PROST” or “BB + PROST” versus the reference category “BB”. The significant predictors include IOP in the current time in each eye, IOP in the previous time in the same eye, treatment in the previous time in each eye, interaction between IOP and treatment in the previous time in the same eye, as well as baseline covariates race, age and baseline IOP in the same eye. The time-varying intercept is modeled by a spline to allow more flexibility. The causal effect is modeled with a blip function with two parameters

EYk,jaˉm,0|Lˉm=lˉm,Aˉm=aˉm−EYk,jaˉm−1,0|Lˉm=lˉm,Aˉm=γm,k,j∗(aˉm,lˉm;ψ)

=ψ1I(am,j=1)+ψ2I(am,j=2),

where ψ1 represents the effect of “Prost” versus “BB”, and ψ2 represents the effect of “BB + Prost” versus “BB”. Negative values refer to lowering IOP more and therefore indicate that a treatment is beneficial relative to “BB”. This model makes the symmetry assumption and also assumes the effect of treatment at each time is the same on each of the later outcomes. Both non-doubly robust (non-DR) and doubly robust (DR) estimators were computed for ψ, and the results are shown in Table 6.

Table 6:

Results of glaucoma data analysis.

	Non-DR estimator			DR estimator
	EST	SE	p-Value	EST	SE	p-Value
ψ1	–0.27	0.34	0.21	–0.15	0.20	0.22
ψ2	0.26	0.23	0.12	0.19	0.12	0.05

^[6]

The DR estimators have smaller standard errors compared to the non-DR estimators. Results from the DR approach show that ψˆ1 = –0.15 with p-value =0.22, suggesting the effect of “PROST” is not significantly different from that of “BB”, and ψˆ2 = 0.19 with p-value =0.047, suggesting the combined treatment “BB + PROST” is significantly worse than “BB” in lowering IOP. The fact that ψˆ2 is positive is counterintuitive, since we would expect a combined treatment to work better than any of the single treatments. One possible explanation for this effect is residual confounding not adjusted for in the model.

7 Discussion

In this work, we have extended SNMMs to settings with clustered observations and time-varying confounding. We formulate models that are generalized to both cluster- and unit-level treatments, with consideration of the effect of interference. We also derive semiparametric estimators of parameters in those models. Our simulation study demonstrated that the estimators are consistent and achieve reasonable efficiency in contrast to those from the conventional GEE regression approach.

We have focused on examples with two units per cluster in all clusters. It is interesting to explore the performance of the estimators under larger cluster sizes, especially in the presence of interference. The symmetry assumption is more important in such cases for reducing the dimension of parameters. One may also consider cases where the cluster size varies between clusters in the study population.

We have applied the method to compare the effectiveness of two commonly used topical ocular hypertension medication in lowering IOP using data from the treatment arm of OHTS. In the glaucoma study, two eyes of a given subject form a cluster. At each visit, different medications were not randomized in the treatment arm. Therefore, it is a setting with clustered observations and time-varying confounding. The effect of prostaglandin relative to beta-blocker is not shown to be statistically significant. The fact that the effect of the combined treatment is shown to be worse than that of the single treatments is counterintuitive. One likely explanation is residual confounding in the analysis, since clinicians usually prescribe combined treatment if the single ones do not work. Some of the residual confounding may also be due to the lack of dose information, which limits the capacity to fairly compare the three treatment categories. The analysis can potentially be improved by adjusting for loss to follow-up and exclusion of patients on other medications through inverse probability weighting (IPW). In this particular dataset, the treatment did not differ between two eyes in most subjects. Therefore, there is not enough power to analyze the effect of treatment on the opposite eye. However, the method introduced can be used to model interference among units within the same cluster when treatment assignments within the cluster are more variable.

We have demonstrated and implemented our method with continuous outcomes and an identity link in SNMMs. Our method can potentially be generalized to discrete outcomes and other link functions [9]. The extension to the log link will be straightforward. A single time point example of SNMM with a log link is

logEY1a1,a2Y2a2,a1|L=l,A=a−logEY10,0Y20,0|L=l,A=a=γ1∗(a1,a2,l)γ2∗(a2,a1,l).

A blipped down outcome

U=Y1exp−γ1∗(A1,A2,L;ψ)Y2exp−γ2∗(A2,A1,L;ψ)

can be used for estimation. For binary outcomes and the logit link, the causal parameter in SNMMs cannot be estimated with G-estimation [2]. Vansteelandt and Goetghebeur [15] proposed the generalized structural mean model to overcome this limitation [15]. The two-stage model includes a structural model to model causal effect and a traditional association model to model the mean of observed outcome in the treatment arm. They also proposed an estimator that is robust to misspecification of the nuisance association model. Robins [16] proposed a similar approach that uses a different parametric modeling restriction on the distribution of the observed data [16]. The causal model in these approaches can be extended to clustered observations in a similar way described in our method. Extensions of the association models to clustered observations will be straightforward.

One limitation of our approach is that when the causal model is misspecified we cannot obtain a meaningful interpretation of the causal parameters. For example, consider the case in which spillover effects exist but are omitted from the SNMM. As we showed via simulation, such misspecification results in biased estimates. Unless the investigator is certain that such effects do not exist or cannot be estimated relatively precisely, the SNMMs should include spillover effects. Neugebauer and van der Laan [17] proposed a nonparametric MSM that has meaningful interpretation for parameters even when the causal model is misspecified [17]. However, an extension of this approach to SNMMs does not give a straightforward interpretation of the parameters. Investigators also want to carefully consider specification of the treatment model, particularly when not using the doubly robust estimator. To improve the likelihood of correctly specifying this model, nonlinear regression techniques such as splines may be employed to inform covariates’ functional form.

Funding statement: Funding: This work was funded by the National Institute of the Diabetes and Digestive and Kidney Diseases (NIDDK) (grant/award number: “R01-DK09004”).

Acknowledgement

The authors are grateful to Vision Research Coordinating Center at the University of Washington in St. Louis for providing the glaucoma data, and to Dr. John Kempen for providing subject matter knowledge in analyzing the glaucoma data.

Appendix

A. Asymptotic variance estimator

We adapted the approach for calculating asymptotic variance of ψˆ from Robins [18]. Let α denote nuisance parameters in the treatment model, Sα the score for α, Sψ the score for ψ, α0 and ψ0 the true parameters for α and ψ, S(α,ψ) = (SαT,SψT)T, Sα,i and Sψ,i the scores for subject i, Si = (Sα,iT,Sψ,iT)T. Since Sψ is the score from a semiparametric model, n−12Sψ|ψ = ψ0 is asymptotically normal with mean 0 and asymptotic variance I(ψ0) = ESψ,iSψ,iT|ψ = ψ0. If α in the treatment model is known, I(ψ0) can be consistently estimated by Iˆ(ψˆ) = 1n∑inSψ,iSψ,iT|ψ = ψˆ. If α needs to be estimated, we first calculate Iˆ(αˆ,ψˆ) = IˆααIˆαψIˆψαIˆψψ=1n∑inSiSiT|α = αˆ,ψ = ψˆ. By Taylor’s expansion and Sluskey’s theorem, I(ψ0) is consistently estimated by Iˆ(ψˆ) = Iˆψψ−IˆψαIˆαα−1Iˆαψ. Again by Taylor’s expansion, n12(ψˆ−ψ0) will be asymptotically normal with mean 0 and asymptotic variance nB−1I(ψ0)B−1 which will be consistently estimated by nBˆ−1Iˆ(ψˆ)Bˆ−1, where B−1 = E∂Sψ,i∂ψT|α = α0,ψ = ψ0,Bˆ−1 = n−1∂Sψ∂ψT|α = αˆ,ψ = ψˆ.

B. Likelihood of SNMMs for clustered observations

For a single time point example, if we assume ignorable treatment, likelihood of a SNMM can be written as the following:

gE(Y|l0,a0)=γ∗(l0,a0)+gE(Y0|l0)−gE(Y0)+gE(Y0)

=γ∗(l0,a0)+gE(Y0|l0)

ε=Y−E(Y|L0,A0)

f(O;ρ)=f(ε(ρ)|L0,A0;η1)⋅f(L0;η2)⋅f(A0|L0;η3),

where O=(Y,L0,A0) represents observed data, ω represents nuisance parameters in the model for E(Y0|l0), ρ=ψ×η×ω represents the parameter space.

The above example can be extended to time-varying treatments where treatments and covariates are measured repeatedly at times 0 to K−1. Outcome is measured only at the last time point K. First define

vm∗lˉm,aˉm−1=gEYaˉm−1,0|lˉm,aˉm−1−gEYaˉm−1,0|lˉm−1,lm=0,aˉm−1

Γm∗lˉm−1,aˉm−1=g∫g−1vm∗lˉm,aˉm−1dFlm|lˉm−1,aˉm−1

=−gEYaˉm−1,0|lˉm−1,lm=0,aˉm−1+gEYaˉm−1,0|lˉm−1,aˉm−1

The functions vm∗ and Γm∗ are related to the association between covariates Lm and counterfactual outcomes Yaˉm−1,0. Likelihood of a SNMM with identity or log link can be written as the following:

gE(Y|lˉK−1,aˉK−1)=∑m=0K−1γm∗(lˉm,aˉm)+∑m=1K−1vm∗lˉm,aˉm−1−Γm∗lˉm−1,aˉm−1+gE(Y0|l0)

ε=Y−E(Y|LˉK−1,AˉK−1)

f(O;ρ)=f(ε(ρ)|LˉK−1,AˉK−1;η1)⋅∏k=0K−1f(Lk|Lˉk−1,Aˉk−1;η2)⋅f(Ak|Lˉk,Aˉk−1;η3),

where O=(Y,LˉK−1,AˉK−1) represents observed data, ω represents nuisance parameters in the model for E(Y0|l0), μ represents nuisance parameters in the models for vm∗lˉm,aˉm−1, ρ=ψ×η×μ×ω represents the parameter space. Choosing a linear working model for vm∗lˉm,aˉm−1 will facilitate writing an analytical form for the full likelihood.

Lastly, the second example can be further generalized to repeatedly measured outcomes at times k=1,...,K. Similarly, we define

vm,k∗lˉm,aˉm−1=gEYkaˉm−1,0|lˉm,am−1−gEYkaˉm−1,0|lˉm−1,lm=0,aˉm−1

Γm,k∗lˉm−1,aˉm−1=−gEYkaˉm−1,0|lˉm−1,lm=0,aˉm−1+gEYkaˉm−1,0|lˉm−1,aˉm−1

Likelihood of a SNMM with identity or log link can be written as the following:

gE(Yk|lˉk−1,aˉk−1)=∑m=0k−1γm,k∗(lˉm,aˉm)+∑m=1k−1vm,k∗lˉm,aˉm−1−Γm,k∗lˉm−1,aˉm−1+gE(Yk0|l0)

εk=Yk−E(Yk|Lˉk−1,Aˉk−1)

f(O;ρ)=∏k=1Kf(εk(ρ)|Lˉk−1,Aˉk−1;η1)⋅∏k=0K−1f(Vk|Lˉk−1,Aˉk−1,Yk;η2)⋅f(Ak|Lˉk,Aˉk−1;η3),

where Lk=(Vk,Yk), O=(YˉK,LˉK−1,AˉK−1), ω represents nuisance parameters in the models for E(Yk0|l0), μ represents nuisance parameters in the models for vm,k∗lˉm,aˉm−1, ρ=ψ×η×μ×ω represents the parameter space.

References

1. RobinsJM. Correcting for non-compliance in randomized trials using structural nested mean models. Commun Stat Theor Met1994;23:2379–412.10.1080/03610929408831393Suche in Google Scholar

2. RobinsJM. Marginal structural models versus structural nested models as tools for causal inference. In: HalloranME and BerryD, editors. Statistical Models in Epidemiology, the Environment, and Clinical Trials.New York: Springer, 2000:95–133.Suche in Google Scholar

3. AngristJD, ImbensGW, RubinDB. Identification of causal effects using instrumental variables. J Am Stat Assoc1996;91:444–55.Suche in Google Scholar

4. VansteelandtS, JoffeM. Structural nested models and g-estimation: the partially realized promise. Stat Sci2014;29:707–31.10.1214/14-STS493Suche in Google Scholar

5. BrumbackBA, HeZ, PrasadM, FreemanMC, RheingansR. Using structural-nested models to estimate the effect of cluster-level adherence on individual-level outcomes with a three-armed cluster-randomized trial. Stat Med2014;33:1490–502.10.1002/sim.6049Suche in Google Scholar PubMed

6. RubinDB. Comment: which ifs have causal answers. J Am Stat Assoc1986;81:961–2.10.1080/01621459.1986.10478355Suche in Google Scholar

7. VanderWeeleTJ. Ignorability and stability assumptions in neighborhood effects research. Stat Med2008;27:1934–43.10.1002/sim.3139Suche in Google Scholar PubMed

8. RubinDB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol1974;66:688.10.1037/h0037350Suche in Google Scholar

9. RobinsJM, RotnitzkyA, ScharfsteinDO. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In: HalloranME and BerryD, editors. Statistical Models in Epidemiology, the Environment, and Clinical Trials. New York: Springer, 2000:1–94.Suche in Google Scholar

10. SobelME. What do randomized studies of housing mobility demonstrate? Causal inference in the face of interference. J Am Stat Assoc2006;101:1398–407.10.1198/016214506000000636Suche in Google Scholar

11. TsiatisA. Semiparametric theory and missing data. Chicago: Springer, 2007.Suche in Google Scholar

12. RobinsJM, HernánMA. Estimation of the causal effects of time-varying exposures. Longitud Data Anal2009:553–99.10.1201/9781420011579.ch23Suche in Google Scholar

13. LiangKY, ZegerSL. Longitudinal data analysis using generalized linear models. Biometrika1986;73:13–22.10.1093/biomet/73.1.13Suche in Google Scholar

14. KassMA, HeuerDK, HigginbothamEJ, JohnsonCA, KeltnerJL, MillerJP, et al.. The ocular hypertension treatment study: a randomized trial determines that topical ocular hypotensive medication delays or prevents the onset of primary open-angle glaucoma. Arch Ophthalmol2002;120:701–13.10.1001/archopht.120.6.701Suche in Google Scholar PubMed

15. VansteelandtS, GoetghebeurE. Causal inference with generalized structural mean models. J R Stat Soc Ser B Stat Methodol2003;65:817–35.10.1046/j.1369-7412.2003.00417.xSuche in Google Scholar

16. RobinsJ, RotnitzkyA. Estimation of treatment effects in randomised trials with non-compliance and a dichotomous outcome using structural mean models. Biometrika2004;91:763–83.10.1093/biomet/91.4.763Suche in Google Scholar

17. NeugebauerR, van der LaanM. Nonparametric causal effects based on marginal structural models. J Stat Plan Inference2007;137:419–34.10.1016/j.jspi.2005.12.008Suche in Google Scholar

18. RobinsJ. Estimation of the time-dependent accelerated failure time model in the presence of confounding factors. Biometrika1992;79:321–34.10.1093/biomet/79.2.321Suche in Google Scholar

Published Online: 2015-6-26

Published in Print: 2015-11-1

Artikel in diesem Heft

https://doi.org/10.1515/ijb-2014-0055

Schlagwörter für diesen Artikel

clustered observations; time-varying confounding; structural nested mean models