Optimal precision of coarse structural nested mean models to estimate the effect of initiating ART in early and acute HIV infection

Judith J. Lok

doi:10.1515/jci-2023-0078

Article Open Access

Optimal precision of coarse structural nested mean models to estimate the effect of initiating ART in early and acute HIV infection

Judith J. Lok

Published/Copyright: February 11, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Causal Inference Volume 13 Issue 1

Abstract

Time-dependent coarse structural nested mean models (coarse SNMMs) were developed to estimate treatment effects from longitudinal observational data. Coarse SNMMs estimate the combined effect of multiple treatment dosages and are thus useful to estimate the effect of treatments that are initiated and then never stopped. Coarse SNMMs lead to a large class of estimators, with widely varying estimates and standard errors. To optimize precision, we derive an explicit solution for the optimal coarse SNMM estimator. We apply our methods by estimating how the effect on immune reconstitution of initiating 1 year of ART depends on the time between HIV infection and ART initiation, in the early stages of HIV infection. The CDC and the WHO are encouraging HIV testing, leading to earlier HIV diagnoses. Thus, more treatment decisions need to be made in early and acute infection. However, evidence is lacking about the clinical benefits of initiating ART in early and acute HIV infection, with guidelines developed mostly from analyzing patients with chronic infection. In the simulations and our motivating HIV application, naive coarse SNMM estimators render useless inference, whereas our new fitting methods render informative analyses. We thus hope that this article leads to broader applicability of SNMMs.

Keywords: doubly robust estimation; early HIV detection; HIV/AIDS; structural nested models

MSC 2010: 62L12; 62L10; 62M10; 62P10

1 Introduction

Several causal inference approaches exist to estimate treatment effects from longitudinal observational data. Time-dependent coarse structural nested mean models (coarse SNMMs, [1]), for which we develop an optimal estimation method, describe the effect of time-dependent treatments, conditional on patient characteristics at the time of treatment initiation. Coarse SNMMs model the mean difference between the outcome with treatment initiated at time m , versus never, on the outcome measured at a later time k , given a patient’s covariate history at time m . Previously, Robins [2] introduced coarse SNMMs for outcomes measured at the end of a study, for trials with noncompliance. Earlier SNMMs studied in [3–5] describe the effect of one treatment dosage conditional on patient characteristics just before this dosage. Estimating the effect of a longer duration of treatment from these quantities requires modeling the distribution of covariates given past treatment and covariate history, possibly leading to bias [4,6], or even incompatibility of the different assumptions [4].

In contrast, coarse SNMMs directly estimate the effect of multiple treatment dosages, and they are thus useful to estimate the effect of a treatment such as ART that once initiated is intended to not be stopped. This relates to target trial emulation [7,8] as follows. In the HIV application, coarse SNMMs are used to estimate the effect of initiating ART, regardless of whether patients choose to remain on ART. Thus, the effect estimated is an intention-to-treat effect, not a per-protocol effect. Alternatively, if reliable data on stopping ART and the reasons for stopping had been available, one could have estimated the effect of initiating ART and continuing on ART thereafter, related to a per-protocol effect. In this latter case, one could censor individuals when they stopped ART and apply inverse probability of censoring weighting (IPCW) for the resulting artificial censoring.

Marginal structural models (MSMs [9,10]) are another class of models to estimate the effect of time-dependent treatments from observational data. MSMs estimate how treatment effects depend on baseline, but not time-dependent, covariates. Robins [6] provides a detailed comparison of MSMs and SNMMs.

In contrast to most of the literature on efficient estimation of treatment effects from observational data (Imbens [11] provides an overview), this article focuses on optimal estimation of the effect of a time-dependent treatment. In addition, coarse SNMMs can estimate how the treatment effect depends on time-dependent pretreatment characteristics, which is not the focus of most of the literature on efficient estimation of treatment effects.

This article provides a description of the estimation methods and the assumptions needed to consistently estimate coarse SNMMs with an outcome measured over time. Detailed proofs of consistency and asymptotic normality can be found in Lok and DeGruttola [1] and Robins [4]. We focus mainly on doubly robust estimation with optimal precision, and on what the results imply for the treatment decisions of HIV-infected patients diagnosed early.

Robins [2] and Lok and DeGruttola [1] for time-dependent outcomes derived a large class of estimating equations for coarse SNMMs, all leading to consistent, asymptotically normal estimators for the treatment effect. It turns out that both estimates and standard errors may depend considerably on the choice of estimating equations within this large class, and a naive choice may render useless inference. Finding this in our HIV example motivated the current article. To optimize precision, this article develops an estimator that has, under an extra condition, optimal precision within a class that includes the estimators from Lok and DeGruttola [1]. This optimal estimator leads to the smallest possible asymptotic variance. The optimal estimator is also doubly robust: it is consistent and asymptotically normal not only if the model for treatment initiation is correctly specified but also if a certain outcome-regression model is correctly specified. The optimal estimator combines weighting and regression, and is therefore a mixed method (compared with mixed methods for point exposures described by Imbens [11]). Also without the extra conditions needed for optimality and in the presence of censoring, the optimal estimator is doubly robust; it just may not be optimal in such settings.

Optimal precision is simpler for coarse SNMMs than for more traditional SNMMs (see [4,12] for theory on traditional SNMMs), because coarse SNMMs avoid the need for accumulating effects over time. This article therefore includes an accessible illustration of the steps involved in calculating optimal estimators. The Web-Appendix has proofs for all theorems of Section 7 for time-dependent coarse SNMMs, resulting in a self-contained example.

We implement the optimal estimator and compare it to other estimators for coarse SNMMs, in a simulation study and in our motivating HIV application. ART is the standard of care for HIV-infected patients. The CDC and the WHO are encouraging HIV testing [13,14], leading to earlier HIV diagnoses. Thus, more treatment decisions need to be made in early and acute HIV infection. However, evidence is lacking about the clinical benefits of initiating ART in early and acute HIV infection, with most patients in the START trial [15] having chronic infection at baseline. Therefore, we estimate how the effect on immune reconstitution of initiating one year of ART depends on the time between HIV infection and ART initiation in the early stages of HIV infection. This HIV application includes correction for informative censoring, and bootstrap confidence intervals (CIs). Web-appendix A proves consistency of the bootstrap for all our estimators, under regularity conditions.

Simulation Section 9 and Application Section 10 show that estimators optimizing for precision perform substantially better in practice than arbitrarily choosing an estimator within the class of coarse SNMMs. This leads to a definitive answer to the question in the HIV application: whether the effect of 1 year of ART substantially increases when ART initiation is delayed in HIV-infected patients diagnosed in early and acute HIV infection.

The rest of this article is organized as follows. Section 2 introduces the motivating data. Section 3 introduces the setting and notation. Section 4 describes the treatment effect model. Section 5 introduces the assumptions necessary to estimate the treatment effect. Section 6 derives a large class of estimators. Section 7 derives the optimal estimator. Section 7.2 proposes doubly robust estimators and shows that they lead to a smaller asymptotic variance. Section 7.3 presents a theorem that guarantees optimality of estimators within their class. Section 7.4 gives an explicit expression for optimal estimators for the parameters of coarse SNMMs with an outcome measured over time. Section 8 describes implementation of the estimators. Section 9 compares the optimal estimator with other estimators in a simulation study. Section 10 compares the optimal estimator with other estimators in the motivating HIV application. Section 11 concludes this article with a Discussion.

2 Motivating data

ART is the standard of care for HIV-infected patients. Guidelines regarding ART initiation have been changing [14,16–18]. Historically, ART initiation was often delayed, depending on the state of the immune system as measured by the CD4 count. The delays were intended to avoid early viral resistance to ART, to thus conserve ART options when those were deemed to be most needed: when CD4 counts were low. Current treatment guidelines [14,18] recommend ART initiation at the time of HIV diagnosis. HIV testing and ART initiation at the time of HIV diagnosis are increasingly implemented, especially in the developed world but also often in resource-limited settings, in part because of the UNAIDS 90-90-90 initiative [19]. 90-90-90 aimed that by 2020, 90% of all people living with HIV would know their HIV status, 90% of all people with diagnosed HIV infection would receive sustained ART, and 90% of all people receiving ART would have viral suppression. While 90-90-90 did not completely reach its goal, the number of HIV-infected individuals who know about their HIV status increased substantially: 81% of HIV-infected individuals knew about their HIV status in 2020. Global access to ART also increased substantially, with 67% of patients diagnosed with HIV on ART in 2020, and the number of patients on ART in 2020 “more than tripled since 2010” [20]. PEPFAR, the U.S. President’s Emergency Plan for AIDS Relief [21], helped enable these increases.

Many patients are diagnosed with HIV when they are already chronically infected, and for those patients, the START trial [15] conclusively demonstrated the benefit of initiating ART immediately. However, with only a small number of patients in the START trial in acute or early infection, there is not a lot of evidence of clinical benefit for initiating ART immediately in patients diagnosed early. Finding such evidence is increasingly important, with the CDC and the WHO encouraging HIV testing [13,14] and the WHO also encouraging partner testing [14], leading to earlier HIV diagnoses. Thus, more treatment decisions need to be made during early and acute HIV infection. Our investigations shed light on the effect of efforts to diagnose HIV early, if early diagnosis is combined with immediate ART initiation as currently recommended.

The decisions to initiate ART are not only important for the HIV-infected patients themselves but also for their partners: ART may not only improve a patient’s own outcomes but has also been shown to reduce the risk of HIV transmission [22–24].

The observational AIEDRP (Acute Infection and Early Disease Research Program) Core01 data describe 1762 HIV-infected patients diagnosed during acute and early HIV infection [25] and thus provide a unique opportunity to estimate the effect of ART in early and acute HIV infection. Dates of HIV infection have been estimated using an algorithm that incorporates clinical and laboratory data [25,26].

By using the AIEDRP Core01 data, we will estimate how the effect on immune reconstitution of initiating 1 year of ART depends on the time between HIV infection and ART initiation. The effect on immune reconstitution of 1 year of ART initiated m months after infection is measured as the CD4 count at month m + 12 with ART initiated at month m versus the CD4 count at month m + 12 without ART. The previous work on the effect of ART in early and acute infection [1] lacks precision and robustness. We develop estimators for the effect of ART with increased robustness and precision, aiming to obtain a definitive answer to whether the effect of 1 year of ART decreases when ART initiation is delayed.

Lok and DeGruttola [1] showed that in the AIEDRP data, ART use depends on covariates, such as the current CD4 count, that are prognostic for the outcome CD4 count; and that this leads to substantial confounding by indication. They showed that there is strong evidence that in these observational data, as expected, patients with a worse prognosis were treated earlier. A naive analysis therefore leads to underestimation of the effect of ART, or even to a reversal of the sign of this effect.

3 Setting and notation

Initially, all patients are assumed to be followed at the same times 0 , 1 , … , K + 1 , with 0 indicating the baseline visit. The assumption that there is no censoring due to loss-to-follow-up is relaxed in Section 8. Y k is the outcome at time k . Y ¯ k = ( Y 0 , Y 1 , … , Y k ) is the outcome history until time k . A k is the treatment at time k . We investigate the effect of a binary treatment A k , which is either given ( A k = 1 ) or not ( A k = 0 ) at each time k . A ¯ k = ( A 0 , A 1 , … , A k ) is the treatment history until time k . We only consider the impact of initiating treatment, and do not consider issues of treatment interruption or compliance: our analysis follows the intention-to-treat principle common in randomized trials. Thus, A k = 0 until treatment is initiated, and A k = 1 thereafter. T is the time treatment was actually initiated, with T = K + 1 if treatment was never initiated. Similarly, L k are the covariates at time k , including the outcome Y k at time k , and L ¯ k = ( L 0 , L 1 , … , L k ) is the covariate history until time k . ℒ ¯ k is the space in which L ¯ k takes its values. We assume that at each visit time k , treatment decisions A k are made after L k is measured and known. We suppress patient-level notation (such as the subscript i often used to indicate individual i ).

Y k ( ∅ ) and Y ¯ k ( ∅ ) are the counterfactual, not always measured, outcome and outcome history at time k under the treatment regime “no treatment.” L k ( ∅ ) and L ¯ k ( ∅ ) are the covariates and covariate history at time k under “no treatment.” Y k ( m ) is the outcome at time k under the treatment regime “start treatment at time m .” We assume that observations and counterfactuals of the different patients are independent and identically distributed [27].

4 Time-dependent coarse SNMMs

Our model for treatment effect is similar to [2], but differs in that as in Lok and DeGruttola [1], it allows for a time-dependent outcome:

Definition 4.1

(Time-dependent coarse SNMMS). For k = m , … , K + 1 ,

γ k m ( l ¯ m ) = E [ Y k ( m ) − Y k ( ∅ ) ∣ L ¯ m = l ¯ m , T = m ] .

γ k m ( l ¯ m ) is the expected difference, for patients who initiated treatment at time m with covariates l ¯ m , of the outcome at time k had the patient initiated treatment at time m , and the outcome at time k had the patient never initiated treatment. It is the effect of treatment between times m and k .

We assume a parametric model for the treatment effect γ , leading to a semiparametric setting. γ models the expected difference between two counterfactual outcomes. How any of these outcomes depends on a patient’s covariates at time m is not specified. For example, it could be that, for k > m and with g k ( L ¯ m ( ∅ ) ) any function of L ¯ m ( ∅ ) ,

(1) Y k ( ∅ ) = g k ( L ¯ m ( ∅ ) ) + ε k ( ∅ ) , Y k ( m ) = g k ( L ¯ m ( ∅ ) ) + ( ψ 1 + ψ 2 m + ψ 3 m 2 ) ( k − m ) + ε k ( m )

(note that k − m is the treatment duration until time k ), with E [ ε k ( m ) ∣ L ¯ m , T = m ] = 0 and E [ ε k ( ∅ ) ∣ L ¯ m , T = m ] = 0 . In this example, g k ( L ¯ m ( ∅ ) ) is left completely unspecified, and

γ k , ψ m ( l ¯ m ) = ( ψ 1 + ψ 2 m + ψ 3 m 2 ) ( k − m ) 1 { k > m } .

Rank preservation holds if Y k ( m ) − Y k ( ∅ ) = γ k m ( L ¯ m ) , but this has been argued to not hold in many practical settings [28,29]. A coarse SNMM is a model for the effect of the treatment on the treated, as discussed by Imbens [11]. However, Lok [30] shows that under a stronger version of the assumption of no unmeasured confounding (albeit with the same interpretation as Assumption 5.1), L ¯ contains enough information to compare those treated and those untreated at time m given L ¯ m , in which case a coarse SNMM is also a model for the effect of the treatment on all patients with given pretreatment covariates L ¯ m . This lack of distinction between the effect of the treatment on the treated and the effect of the treatment is due to SNMMs conditioning on pretreatment covariates.

Assumption 4.2

(Parameterizing coarse SNMMs). γ k , ψ m ( l ¯ m ) is a correctly specified model for γ k m ( l ¯ m ) , where ψ is a parameter in R p and ψ * is the true parameter.

The following example motivated this work on estimation with optimal precision:

Example 4.3

(Effect of ART depending on the time between estimated date of HIV infection and treatment initiation in HIV-infected patients). We initially assume that

(2) γ k , ψ m ( l ¯ m ) = ( ψ 1 + ψ 2 m + ψ 3 m 2 ) ( k − m ) 1 { k > m } ,

where ( k − m ) is the treatment duration from month m to month k , is a correctly specified parametric model for γ . If the treatment effect might also depend nonlinearly on the treatment duration, one could add nonlinear terms like ψ 4 ( k − m ) 2 1 { k > m } , or additionally include nonlinear terms depending on the treatment initiation time, like ( ψ 4 + ψ 5 m ) ( k − m ) 2 1 { k > m } .

The treatment effect γ may also depend on pretreatment covariates, such as the log 10 HIV viral load ( l v l ), resistance mutations, and the CD4 count. To incorporate pretreatment covariates such as log 10 HIV viral load, one can extend the model by including terms such as ψ 4 l v l m ( k − m ) 1 { k > m } . We choose a function of ( k − m ) since the treatment duration may be predictive of its effect. If the treatment effect depends only on the viral load at treatment initiation for the first month of treatment, one might add terms such as ψ 4 l v l m 1 { k > m } .

Following [3,4], we use the propensity score [31], p ( m ) = P ( A m = 1 ∣ A ¯ m − 1 = 0 ¯ , L ¯ m ) , predicting treatment given the past, to estimate the treatment effect γ . For the nondoubly-robust estimators, we assume:

Assumption 4.4

(Parameterizing the propensity score). p θ ( m ) is a correctly specified model for p ( m ) = P ( A m = 1 ∣ A ¯ m − 1 = 0 ¯ , L ¯ m ) , with θ a parameter in R q and θ * the truth.

Specifying a model p θ ( m ) for p ( m ) is often more feasible than specifying an outcome model, since p θ ( m ) is a model for treatment initiation in practice, and clinicians will have ideas about this decision process, at least qualitatively (although these ideas will often vary between clinicians). We typically estimate θ * by maximum (partial) likelihood, the standard method provided by most logistic regression software.

5 No unmeasured confounding and consistency

As shown in [1–3,5], to distinguish between the treatment effect and confounding by indication, we assume no unmeasured confounding: that information is available on all factors that both (1) influence treatment decisions and (2) possibly predict a patient’s prognosis with respect to the outcome of interest. Y k ( ∅ ) , the outcome at time k without treatment, reflects a patient’s prognosis. If there is no unmeasured confounding, treatment decisions at time m ( A m ) are independent of this (not always measured) prognosis Y k ( ∅ ) ( k > m ), given past treatment and covariate history ( A ¯ m − 1 , L ¯ m ) :

Assumption 5.1

(No unmeasured confounding - formalization). A m ⊥ ⊥ Y k ( ∅ ) ∣ L ¯ m , A ¯ m − 1 for k > m , where ⊥ ⊥ means: is conditionally independent of [32].

The observed outcome Y k cannot replace Y k ( ∅ ) here, because Y k may well be affected by A m .

If a patient is not treated until time k , there is no difference in treatment between Y k and Y k ( ∅ ) . In this and similar cases, it is reasonable to assume that until time k , the observed outcomes and the outcomes without treatment would have been the same:

Assumption 5.2

(Consistency). If T ≥ k , Y k = Y k ( ∅ ) and L ¯ k = L ¯ k ( ∅ ) . Y ( T ) = Y .

6 Estimation: unbiased estimating equations

This section describes the estimation methods from [1,2]. Proofs of Theorems 6.3 and 6.4 can be found in Lok and DeGruttola [1].

Definition 6.1

On k > T , define H ( k ) = Y k − γ k T ( L ¯ T ) . On k ≤ T , define H ( k ) = Y k .

Example 6.2

(Effect of ART depending on the time between HIV infection and treatment initiation in HIV-infected patients). In the setting of Example 4.3, on k > T ,

H ( k ) = Y k − ( ψ 1 + ψ 2 T + ψ 3 T 2 ) ( k − T ) .

For the true ψ * , H ψ * ( k ) = Y k − γ k , ψ * T ( L ¯ T ) = H ( k ) . This so-called blipping off of the treatment effect generates a random variable H ( k ) that mimics a counterfactual outcome:

Theorem 6.3

(Mimicking counterfactual outcomes). Under consistency Assumption 5.2, for m ≤ K and k ≥ m ,

E [ H ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ , A m ] = E [ Y k ( ∅ ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ , A m ] .

Thus, under no unmeasured confounding Assumption 5.1, E [ H ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ , A m ] does not depend on A m . This plays a crucial role in estimation.

Under no unmeasured confounding Assumption 5.1, given the past treatment and covariate history, Y k ( ∅ ) does not help to predict treatment changes. In other words, in the model for treatment changes (Assumption 4.4), Y k ( ∅ ) does not contribute. This idea is similar to [3,5]. Lok and DeGruttola [1] proved that because of Theorem 6.3, the same holds for H ψ * ( k ) , and that similar to [33,34], this leads to the following theorem:

Theorem 6.4

(Unbiased estimating equations). Under no unmeasured confounding Assumption 5.1 and consistency Assumption 5.2, consider any q → m k : ℒ ¯ m → R p , m = 0 , … , K , k > m , which are measurable, bounded, and vector-valued. Then

E ∑ m = 0 K ∑ k > m q → m k ( L ¯ m ) H ( k ) 1 A ¯ m − 1 = 0 ¯ { A m − p ( m ) } = 0 .

If furthermore γ ψ correctly specifies γ (Assumption 4.2) and p θ ( m ) correctly specifies p ( m ) (Assumption 4.4), then

P n ∑ m = 0 K ∑ k > m q → m k ( L ¯ m ) H ψ ( k ) 1 A ¯ m − 1 = 0 ¯ { A m − p θ ( m ) } = 0 ,

stacked with the estimating equations for θ * , where P n is the empirical measure P n ( X ) = 1 ∕ n ∑ i = 1 n X i , are unbiased estimation equations for ( ψ , θ ) . q → m k here are allowed to depend on ( ψ , θ ) , as long as they are measurable and bounded for ( ψ * , θ * ) .

For identifiability of the estimator, one needs as many estimating equations as parameters, in this case by choosing the dimension of q → . Including k ≤ m does not help in these estimating equations, since for those k , on A ¯ m − 1 = 0 ¯ , H ψ ( k ) = Y k ( ∅ ) = Y k , which is part of L ¯ m and therefore generates a term with expectation 0 regardless of ψ .

If γ is linear in ψ , the estimation approach of Theorem 6.4 leads to a linear restriction on ψ once the parameter θ has been estimated, and thus to a closed form expression for ψ ˆ .

7 Estimating equations leading to optimal precision

7.1 Assumptions and restrictions on the estimating equations

From the vast literature on unbiased estimating equations, see for example Van der Vaart [35] Chapter 5, Theorem 6.4 implies that under regularity conditions, ψ ˆ is consistent and asymptotically normal for any choice of q → with the same dimension as ψ . Under those regularity conditions, the asymptotic variance of an estimator ψ ˆ that is a zero of P n G ψ , with E G ψ * = 0 , equals

(3) { E ( ∂ ∕ ∂ ψ ∣ ψ * G ψ ) } − 1 E ( G ψ * G ψ * ⊤ ) { E ( ∂ ∕ ∂ ψ ∣ ψ * G ψ ) } − 1 ⊤ ,

(4) because n 1 ∕ 2 ( ψ ˆ − ψ * ) = { E ( ∂ ∕ ∂ ψ ∣ ψ * G ψ ) } − 1 n 1 ∕ 2 P n ( G ψ * ) + o P ( 1 ) .

In the following, we restrict to estimators satisfying the regularity conditions needed for (4) and thus (3) to hold. Among such estimators, we derive the optimal choice of q → . The remainder of this article also assumes no unmeasured confounding Assumption 5.1, consistency Assumption 5.2, and correct specification of the coarse structural nested mean model, Assumption 4.2. Web-Appendix A provides proofs of all theorems from Section 7.

7.2 Doubly robust estimators lead to increased precision

Definition 7.1

Let

G ( ψ , θ , q ) = ∑ m = 0 K ∑ k > m q → m k ( L ¯ m ) H ψ ( k ) 1 A ¯ m − 1 = 0 ¯ { A m − p θ ( m ) } , G * ( ψ , θ , q ) = ∑ m = 0 K ∑ k > m q → m k ( L ¯ m ) { H ψ ( k ) − E [ H ψ ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] } 1 A ¯ m − 1 = 0 ¯ { A m − p θ ( m ) } .

First, consider estimating equations for ψ when θ * and thus the propensity score is known:

Theorem 7.2

(Replacement of estimating equations by more efficient ones). Under no unmeasured confounding Assumption 5.1, consistency Assumption 5.2 and the usual regularity conditions for the sandwich estimator for the variance (equation (3)), P n ( G * ( ψ , θ * , q ) ) = 0 are unbiased estimating equations which, for given q → , lead to a smaller asymptotic variance of ψ ˆ than the estimating equations P n ( G ( ψ , θ * , q ) ) = 0 .

The equations based on G * are not true estimating equations: they depend on the parameter of interest ψ through the conditional expectation of H ψ . We return to this issue later. Notice that Theorem 6.3 facilitates model specification for the conditional expectation of H .

Usually, θ * is unknown and has to be estimated. For the more efficient estimators based on G * of Theorem 7.2, estimating θ * does not change the asymptotic variance of ψ ˆ . This result is similar to Proposition 1 in Rotnitzky and Robins [36] for a missing data problem.

Theorem 7.3

Replacement of θ * by θ ˆ from a correctly specified pooled logistic regression model fitted by maximum partial likelihood estimation, leading to ψ ˆ that solves P n ( G * ( ψ , θ ˆ , q ) ) = 0 , leads to the same asymptotic variance for ψ ˆ as the estimator for ψ * that solves P n ( G * ( ψ , θ * , q ) ) = 0 .

For the estimators that solve P n ( G ( ψ , θ * , q ) ) = 0 , estimating θ * may change the asymptotic variance. It usually reduces the asymptotic variance, as was also seen in Robins [37] and Lok [33] for different structural nested models. However, the resulting estimator is never more efficient than its doubly robust counterpart:

Theorem 7.4

For q → fixed, replacement of θ * by θ ˆ from a correctly specified pooled logistic regression model fitted by maximum partial likelihood estimation does not make the estimator for ψ * which solves P n ( G ( ψ , θ ˆ , q ) ) = 0 more efficient than the estimator for ψ * which solves P n ( G * ( ψ , θ ˆ , q ) ) = 0 .

As for other SNMMs ([37], 3.10 page 23), estimators for coarse SNMMs resulting from G * are also doubly robust:

Theorem 7.5

(Double robustness). The estimator ψ ˆ solving P n G * ( ψ , θ ˆ , q ) = 0 is doubly robust: stacked with the estimating equations for θ , these estimating equations are unbiased for ψ if either p θ or E [ H ψ ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] is correctly specified. Thus, ψ ˆ is consistent and asymptotically normal if either p θ or E [ H ψ ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] is correctly specified for all m = 0 , … , K , k > m .

It follows that both for robustness and for efficiency, the estimators with G * are preferable.

7.3 A theorem that guarantees optimal precision

Theorem 7.6, a consequence of Theorem 5.3 from [38], provides a sufficient criterion for q → opt to be optimal within our two classes of estimating equations, described by G and G * .

Theorem 7.6

(Optimal precision criterion with θ * known). For both G and G * : if q → opt satisfies

(5) E ( ∂ ∕ ∂ ψ ∣ ψ * G ( ψ , θ * , q ) ) = E ( G ( ψ * , θ * , q ) G ( ψ * , θ * , q → opt ) ⊤ ) ,

then no other q → satisfying our regularity conditions within this class leads to an estimator for ψ * with a smaller asymptotic variance than q → opt . The estimator resulting from q → opt has asymptotic variance equal to the inverse of E ( G ( ψ * , θ * , q → opt ) G ( ψ * , θ * , q → opt ) ⊤ ) . There is a unique (in L 2 ( P ) -sense) solution to equation (5) within this class of estimating equations.

7.4 Explicit expression for estimating equations leading to optimal precision for coarse SNMMs with a time-varying outcome

This section finds the q → which is optimal under the following condition:

Assumption 7.7

(Homoscedasticity). For 0 ≤ m ≤ K and k , s > m ,

cov [ H ( k ) , H ( s ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ , A m ]

does not depend on A m .

Without homoscedasticity Assumption 7.7, the optimal estimator derived below is still doubly robust, it just may not be optimal. Assumption 7.7 is a homoscedasticity assumption: a conditional covariance does not depend on A m . Because of Theorem 6.3 and no unmeasured confounding Assumption 5.1, Assumption 7.7 is equivalent to

(6) E [ H ( k ) H ( s ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ , A m ] = E [ H ( k ) H ( s ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] .

This is not far-fetched: under no unmeasured confounding Assumption 5.1, because of Theorem 6.3, the conditional expectation given L ¯ m , A ¯ m − 1 = 0 ¯ , A m of the two factors H ( k ) and H ( s ) does not depend on A m . Assumption 7.7 can be checked empirically by using a preliminary estimator ψ ˜ for ψ * , regressing the product H ψ ˜ ( k ) H ψ ˜ ( s ) on L ¯ m , A ¯ m , and investigating whether parameter(s) describing the dependence on A m are equal to 0.

Rank preservation holds if H ( k ) does not just mimic Y k ( ∅ ) as described in Theorem 6.3, but if H ( k ) is equal to Y k ( ∅ ) . No unmeasured confounding Assumption 5.1 could be extended, with the same interpretation, to ( Y k ( ∅ ) , Y s ( ∅ ) ) ⊥ ⊥ A m ∣ L ¯ m , A ¯ m − 1 = 0 ¯ . Under this formulation of no unmeasured confounding and rank preservation, equation (6) and thus Assumption 7.7 are immediate. Unfortunately, rank preservation is a very strong assumption, which we prefer to avoid. For a discussion, see, e.g., [28,29].

Theorem 7.8 describes the estimator with optimal precision in an example:

Theorem 7.8

(Estimator with optimal precision). Suppose that

γ k , ψ m ( L ¯ m ) = ( ψ 1 + ψ 2 m + ψ 3 m 2 + ψ 4 k ) ( k − m ) + ( ψ 5 k 2 + ψ 6 ( k − m ) + ψ 7 l v l m ) ( k − m ) + ψ 8 l v l m

(compare with Example 4.3). Suppose that homoscedasticity Assumption 7.7 holds. For k > m , define

(7) Δ → m ( k ) = ( k − m ) − E [ T r ( m , k ) ∣ A ¯ m = 0 ¯ , L ¯ m ] m ( k − m ) − E [ T T r ( m , k ) ∣ A ¯ m = 0 ¯ , L ¯ m ] m 2 ( k − m ) − E [ T 2 T r ( m , k ) ∣ A ¯ m = 0 ¯ , L ¯ m ] k ( k − m − E [ T r ( m , k ) ∣ A ¯ m = 0 ¯ , L ¯ m ] ) k 2 ( k − m − E [ T r ( m , k ) ∣ A ¯ m = 0 ¯ , L ¯ m ] ) ( k − m ) 2 − E [ T r 2 ( m , k ) ∣ A ¯ m = 0 ¯ , L ¯ m ] l v l m ( k − m ) − E [ l v l T T r ( m , k ) ∣ A ¯ m = 0 ¯ , L ¯ m ] l v l m − E [ l v l T A k − 1 ∣ A ¯ m = 0 ¯ , L ¯ m ]

where T r ( m , k ) is the number of treated times between time m and time k, and

(8) cov → m [ H I 8 ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] = Γ min , min m I 8 Γ min , min + 1 m I 8 Γ min , max m I 8 Γ min + 1 , min m I 8 Γ max , min m I 8 Γ max , max m I 8

where I 8 the 8 × 8 identity matrix, Γ k , s m = cov [ H ( k ) , H ( s ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] , min = m + 1 , and max = K ,

q → m opt , min ( L ¯ m ) q → m opt , min + 1 ( L ¯ m ) q → m opt , max ( L ¯ m ) = ( cov → m [ H I 8 ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] ) − 1 Δ → m ( min ) Δ → m ( min + 1 ) Δ → m ( max ) ,

P n ( G * ( ψ , θ ˆ , q → opt ) ) = 0 leads to an estimator ψ ˆ with optimal precision: any other estimator in our class, solving P n ( G * ( ψ , θ ˆ , q → ) ) = 0 for some q → , leads to an asymptotic variance that is at least as large as the asymptotic variance of ψ ˆ . Without homoscedasticity Assumption 7.7, this estimator is not optimal but still doubly robust.

The class of estimators that solve estimating equations of the form P n ( G * ( ψ , θ * , q → ) ) = 0 leads to the smallest asymptotic variance of all estimators considered in this article (Theorems 7.2 and 7.4), and estimating θ * with pooled logistic regression does not change the asymptotic variance of estimators that solve P n ( G * ( ψ , θ ˆ , q → ) ) = 0 (Theorem 7.3). q → opt from Theorem 7.8 therefore leads to the smallest possible asymptotic variance.

Theorem 7.9

(Estimator with optimal precision). Formulating Theorem 7.8 for different treatment effect models γ can be done as follows. When choosing simpler models for γ ψ , delete the corresponding rows in equation (7) for Δ → m ( k ) and replace the 8 in I 8 by the number of remaining parameters. For more complicated or different models, notice that the first entry in each row of Δ → m ( k ) corresponds to the second entry in each row but with A m = 0 replaced by A m = 1 . In addition, the model for γ of Theorem 7.8 can be easily generalized to contain similar terms depending on other covariates; the optimal estimator then follows similar to Theorem 7.8.

The term Γ k , s m and the double robustness term E [ H ψ ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] are fixed functions of L ¯ m , but they may depend on ψ . We will use an initial estimator ψ ˜ , doubly robust but not optimal, in place of ψ * in Γ k , s m and E [ H ψ ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] . If the treatment initiation model p θ is correctly specified, the estimating equations are unbiased for any fixed value of ψ ˜ . In the HIV application, ψ ˜ is an estimator with q → m k only nonzero for k = m + 12 . Using the same reasoning as for Theorem 7.8, the optimal such estimator results from

q → m m + 12 ( L ¯ m ) = ( Var [ H ( m + 12 ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] ) − 1 Δ → m ( m + 12 ) .

Replacing the conditional variance of H ( m + 12 ) by a working identity covariance matrix gives ψ ˜ based on q ˜ m m + 12 = Δ → m ( m + 12 ) . That this leads to valid estimators ψ ˜ can be seen by stacking the estimating equations including this q ˜ with estimating equations for the parameters in Δ → m .

The inverse of cov → m [ H I 8 ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] in equation (8) equals the inverse of the conditional covariance matrix of H with each entry replaced by the entry times I 8 .

The estimator with optimal precision requires estimating additional nuisance parameters. In small samples, this may be an issue, but as for many efficient estimators (see e.g. [39–42]), it does not lead to a larger asymptotic variance if all models are correctly specified:

Theorem 7.10

Suppose ψ ˜ 2 is a preliminary estimator of ψ * which is the result of unbiased estimating equations P n G ˜ ( ψ 2 ) = 0 , θ ˆ is an estimator of θ * from a correctly specified pooled logistic regression model with estimating equations P n U ( θ ) = 0 , and q → ψ , ξ opt and E ξ [ H ψ ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] are parameterized by ξ , with q → ψ * ξ * opt the true q → opt and where ξ * is estimated using estimating equations P n J ( ξ , ψ ) = 0 with E J ( ξ * , ψ * ) = 0 . Then, under regularity conditions, solving ψ ˆ from the unbiased estimating equations

(9) P n ( G * ( ψ , θ , q → m , ψ 2 , ξ opt ) G ˜ ( ψ 2 ) J ( ξ , ψ 2 ) U ( θ ) ) = 0

results in the same asymptotic variance for ψ ˆ as using the true (but unknown) q → opt from Theorem 7.8 in the estimating equations P n ( G * ( ψ , θ ˆ , q → opt ) ) = 0 .

Simultaneously solving the estimating equations (9) leads to the same estimator ψ ˆ as plugging ( ψ ˜ 2 , θ ˆ , ξ ˆ ) into the estimating equations for ψ * and then solving for ψ ˆ . Theorem 7.10 implies that if all models are correctly specified, ψ ˆ has optimal precision within the classes studied here.

Instead of estimating Γ k , s m = cov [ H ( k ) , H ( s ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] , one may choose to use a so-called working covariance matrix and replace Γ m by for example the identity matrix. This can be compared with working correlation matrices in generalized estimating equations (GEEs) as in [43]. As for GEEs, the resulting estimator does not have optimal precision, but consistency, asymptotic normality, and double robustness are not affected.

Under regularity conditions, we can use the bootstrap to create CIs for (functions of) ψ for all estimators considered:

Theorem 7.11

(Consistency of the bootstrap). Under regularity conditions, the nonparametric bootstrap for all estimators in Section 7 is consistent under the conditions already adopted for consistency and asymptotic normality. That is, bootstrap CIs have asymptotically correct coverage.

8 Implementing these methods to estimate the effect of ART in early and acute HIV infection: estimation steps

8.1 Implementation: general remarks

Section 8 details the implementation of the estimators proposed in this article in our motivating HIV application. The model for γ here is Example 4.3 model (2), but the methods can be easily adapted to other treatment effect models. Treatment effect models that are linear in ψ are especially attractive, because they lead to estimating equations that are linear in ψ and therefore easy to solve. To limit the effect of model misspecification of γ k m where this is not strictly necessary, k was limited to k = max ( m + 1 , 12 ) , … , min ( m + 12 , 24 ) . SAS 9.1.3 (SAS Institute Inc., Cary, North Carolina, USA) was used for all analyses.

We adopt a pooled logistic regression model for the treatment prediction model p θ ( m ) . For the outcome Y k , one can choose either the CD4 count itself or the CD4 count increase between months m and k . From Definition 4.1 of the treatment effect γ , the same quantity is estimated whether Y k is the CD4 count or the CD4 count increase: subtracting CD 4 m from the outcome CD 4 k affects both Y k ( m ) and Y k ( ∅ ) the same way, and hence the CD 4 m terms in γ cancel. The CD4 count increase between months m and k likely reflects more than CD 4 k the effect of ART between months m and k , i.e., less noise is expected. Thus, we use CD 4 k – CD 4 m as the outcome Y k for all estimators that are not doubly robust. For doubly robust estimators, subtracting CD 4 m from the outcome CD 4 k does not affect the point estimates, since CD 4 m is included in L ¯ m ; thus, subtracting CD 4 m from CD 4 k affects both Y k and E [ H ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] in the same way, and hence, the CD 4 m terms in the estimating equations cancel.

8.2 The preliminary estimator ψ ˜

As the preliminary estimator ψ ˜ , necessary to implement the estimator with optimal precision, we use a doubly robust version of the estimator from [1]. This choice of q → m k is the same as in Theorem 7.9, but with q → m k = 0 for k ≠ m + 12 , and with the cov → m [ H I 3 ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] replaced by working identity matrices. Under a homoscedasticity condition, this choice of q → is optimal within the class of estimating equations with q → m k = 0 for k ≠ m + 12 . In the models for q → , E [ T r ( m , m + 12 ) ∣ L ¯ m , A ¯ m = 0 ¯ ] , etc., since a substantial number of patients did not initiate ART, we first estimate P ( T r ( m , m + 12 ) ≠ 0 ∣ L ¯ m , A ¯ m = 0 ¯ ) using logistic regression, and then, conditional on T r ( m , m + 12 ) ≠ 0 and A ¯ m = 0 ¯ , regress T r ( m , m + 12 ) on the covariates L ¯ m . In the presence of censoring, we restrict these regressions to patients still in follow-up at month m + 12 . Misspecification of q → of the preliminary estimator does not affect asymptotic optimality or double robustness of the optimal estimator that makes use of it. In finite samples it may affect the variance.

With the treatment effect model of Example 4.3 equation (2), H ( m + 12 ) = Y m + 12 − ( ψ 1 + ψ 2 T + ψ 3 T 2 ) T r ( m , m + 12 ) , and to estimate E [ H ( m + 12 ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] , we consider each term in this expression separately, leaving in ψ . For E [ T r ( m , m + 12 ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] , we use the same approach as for E [ T r ( m , m + 12 ) ∣ L ¯ m , A ¯ m = 0 ¯ ] . In the presence of censoring, we use IPCW [44] starting at month m . IPCW requires that censoring is missing at random (MAR, [45]); here, with C p = 0 indicating a patient was uncensored at month p : censoring C p at month p is independent of ( Y ¯ , L ¯ , A ¯ ) given ( Y ¯ p − 1 , L ¯ p − 1 , A ¯ p − 1 , C ¯ p − 1 = 0 ¯ ) . IPCW also requires correct specification of the model for the censoring probabilities. The resulting IPCW weights are

W m , k = ∏ p = m + 1 k P ( C p = 0 ∣ L ¯ p − 1 , A ¯ p − 1 , C ¯ p − 1 = 0 ¯ ) − 1 .

This procedure leads to estimating equations that are linear in ψ and thus easy to solve.

8.3 The optimal estimator

The mimicking outcome H ( k ) is first estimated by H ψ ˜ ( k ) . Theorem 6.3 facilitates model specification for E [ H ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] . Linear regression is used to estimate E [ H ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] . In the presence of censoring, we use IPCW, as for the preliminary estimator. This leads to an estimator E ˆ [ H ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] based on estimating equations. Optimal precision depends on correct specification of E [ H ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] , but if the treatment initiation model p θ is correctly specified, consistency and asymptotic normality do not (because of double robustness). In the model for Δ , E [ T r ( m , k ) ∣ L ¯ m , A ¯ m = 0 ¯ ] , the same approach as described in Section 8.2 is used: first P ( T r ( m , k ) ≠ 0 ∣ L ¯ m , A ¯ m = 0 ¯ ) is estimated using logistic regression; then, conditional on T r ( m , k ) ≠ 0 and A ¯ m = 0 ¯ , T r ( m , k ) is regressed on the covariates L ¯ m . In the presence of censoring, the regression is restricted to patients still in follow-up at month k . In the simulations, cov → m [ H I 3 ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] (Theorem 7.9) does not depend on L ¯ m . In the HIV application, a working model not depending on L ¯ m is used for cov → m [ H I 3 ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] , similar to a working covariance matrix. Γ k , s m is estimated by the average over all patients of

{ H ψ ˜ ( k ) − E ˆ [ H ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ] } { H ψ ˜ ( s ) − E ˆ [ H ( s ) ∣ L ¯ m , A ¯ m − 1 = 0 ] } .

Alternatively, also after plugging in ψ ˜ , one could use techniques similar to generalized estimating equations (GEEs) to estimate a working covariance matrix. In the presence of censoring, C ¯ max ( k , s ) = 0 ¯ is added to the conditioning event. Misspecification of q → (through Δ or cov → m [ H I 3 ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] ) leads to a suboptimal estimator, but (because of Theorem 7.10) does not affect double robustness: any specifications lead to q → only depending on L ¯ m and parameters solving estimating equations.

This procedure leads to estimating equations that are linear in ψ and thus easy to solve.

8.4 For comparison, a naive choice

For comparison, we implement two nondoubly-robust estimators, based on Theorem 6.4 and not using the optimal precision theory developed here. For these Theorem 6.4-based estimators, since in the HIV application, interest lies in the effect of 1 year of ART, q → m k = 0 for k ≠ m + 12 , and the q → m m + 12 are as follows. Let CD 4 m be the CD4 count at month m , injdrug an indicator of whether the patient ever injected drugs at or before the first visit, lvl m the log 10 viral load at month m , and firstvisit m an indicator for whether month m was the month of the first visit. In the simulations, we choose

(10) q → m m + 12 = ( CD4 m m injdrug ) ⊤ ,

(11) and q → m m + 12 = ( CD4 m injdrug CD4 6 ) ⊤ .

In the HIV application we choose

(12) q → m m + 12 = ( CD4 m m lvl m ) ⊤

(13) and q → m m + 12 = ( CD4 m lvl m firstvisit m ) ⊤ .

9 Simulations

We simulate data with monthly visits, and base simulation choices on the AIEDRP Core01 data on HIV-infected patients. We use an autoregressive model to simulate the course of the CD4 count, which may be more realistic in months 6–30 than before month 6, given the different behavior of CD4 counts in the first 6 months after HIV infection (Web-Appendix C). Therefore, we simulate data in months 6–30 and estimate the effect of treatment initiation in months 6–18. Web-Appendix C details the simulations.

We simulate two scenarios: 1: 1,000 datasets with 1,000 observations each, and 2: 500 datasets with 5,000 observations each. We fit model (2) with two parameters, ( ψ 1 , ψ 2 ) , and three parameters, ( ψ 1 , ψ 2 , ψ 3 ) . In these simulations, the true ψ 3 equals 0. Since in the HIV application interest focuses on the effect of 1 year of ART, we restrict the estimating equations to k = m + 1 , … , m + 12 so as to rely less on specification of the treatment effect model γ . Web-Appendix C.2 describes the models fitted for the nuisance parameters. Table 1 shows the root mean squared errors (rMSEs) of the different estimators described in Sections 7 and 8.

Table 1 shows results for the following estimators: 1a and 1b: Naive choices of estimators (Section 8.4), not doubly robust. 2: q approximately optimal-precision within the class with q k m = 0 for k ≠ m + 12 (Section 8.2), not doubly robust. 3: q approximately optimal-precision within the class with q k m = 0 for k ≠ m + 12 (Section 8.2), doubly robust. 4: Like the optimal-precision estimator, but with working identity covariance matrices (Sections 7 and 8.3), doubly robust. 5: Optimal-precision under correct specification of all models (Sections 7 and 8.3), doubly robust.

Table 1 shows great improvements in precision from applying our theory in comparison with naive choices of estimating equations (Table 1 estimators 1a and 1b, described in Section 8.4). Choosing q → without using optimality theory can lead to useless inference, which can be expected to also make a doubly robust version of the naive estimator useless since the initial estimator varies so much. Using some optimality theory leads to considerable improvements (estimator 2). Making this estimator doubly robust (estimator 3) results in improvements of the mean squared errors in the three-parameter model. Using more theory (estimators 4 and 5) results in improvements overall, and our estimator with optimal asymptotic variance (estimator 5) performs best.

Table 1

Simulations: Comparison of rMSE and bias for the various estimators

Model	Two parameters				Three parameters
	ψ 1		ψ 2		ψ 1		ψ 2		ψ 3
	rMSE	(bias)	rMSE	(bias)	rMSE	(bias)	rMSE	(bias)	rMSE	(bias)
1,000 patients in 1,000 datasets
1a. q as in (10), not DR	3.5	(0.07)	0.29	( ‒ 0.007 )	3570	(107)	784	( ‒ 24 )	38	(1)
1b. q as in (11), not DR	1019	( ‒ 28 )	94	(2)	736	( ‒ 14 )	163	(2)	8.0	( ‒ 0.06 )
2. restricted approx. optimal, not DR	2.1	( ‒ 0.05 )	0.17	(0.003)	10.3	(1.4)	1.9	( ‒ 0.3 )	0.084	(0.01)
3. restricted approx. optimal, DR	2.1	( ‒ 0.04 )	0.17	(0.003)	6.6	(0.1)	1.2	( ‒ 0.03 )	0.049	(0.001)
4. working identity covariances, DR	1.6	( ‒ 0.06 )	0.10	(0.003)	3.9	(0.04)	0.53	( ‒ 0.01 )	0.017	(0.0005)
5. approximately optimal, DR	1.3	( ‒ 0.04 )	0.079	(0.002)	3.1	( ‒ 0.05 )	0.41	(0.004)	0.013	( ‒ 0.00005 )
5,000 patients in 500 datasets
1a. q as in (10), not DR	1.6	(0.05)	0.13	( ‒ 0.006 )	285	(8)	58	( ‒ 2 )	2.7	(0.07)
1b. q as in (11), not DR	187	( ‒ 8 )	18	(0.8)	529	(10)	112	( ‒ 2 )	5.4	(0.1)
2. restricted approx. optimal, not DR	0.94	(0.003)	0.073	( ‒ 0.002 )	4.2	(0.2)	0.78	( ‒ 0.04 )	0.034	(0.002)
3. restricted approx. optimal, DR	0.94	(0.005)	0.073	( ‒ 0.002 )	2.9	(0.04)	0.51	( ‒ 0.009 )	0.022	(0.0003)
4. working identity covariances, DR	0.75	( ‒ 0.03 )	0.047	(0.0009)	1.7	(0.06)	0.23	( ‒ 0.01 )	0.0075	(0.0005)
5. approximately optimal, DR	0.59	( ‒ 0.02 )	0.035	(0.0007)	1.4	( ‒ 0.01 )	0.18	( ‒ 0.0004 )	0.0056	(0.00003)

rMSE: root mean squared error. DR: doubly robust.

1a. and 1b. Naive choices of estimators within class (Section 7.4), not doubly robust.

2. q approximately optimal within the class with q k m = 0 for k ≠ m + 12 (Section 7.2), not doubly robust.

3. q approximately optimal within the class with q k m = 0 for k ≠ m + 12 (Section 7.2), doubly robust.

4. Like the optimal estimator, but with working identity covariance matrices (Sections 6 and 7.3), doubly robust.

5. Optimal under correct specification of all models (Sections 6 and 7.3), doubly robust.

For all parameters, the true parameter values lie in between the 2.5 and 97.5% quantiles (over the simulated datasets) of the estimated parameters for estimators 2–5.

For n = 1,000, we also investigate choosing sparser models (Web-Appendix C.2) in the expressions for the prediction of treatment duration (for q → and for the doubly robust term). Choosing sparser models makes little difference for estimators 3–5. For estimator 2, sparser models for q lead to substantially larger mean squared errors (results not shown).

Figure 1 shows the results for the datasets with 1,000 observations. Figure 1 compares the performance of estimators of the effect of ART on the 1-year increase in the CD4 count due to ART initiated at various months since HIV infection. This γ m + 12 m is the estimand of interest in the motivating HIV application. For example, for month 11, Figure 1 shows the rMSE for the expected difference in the CD4 count at month 23 = 12 + 11 between (1) initiating ART at month 11 versus (2) never initiating ART. Figure 1 does not include the naive estimators (1a and 1b in Table 1), since those perform much worse and incorporating them would make a comparison of the other estimators impossible. The same pattern appears as in Table 1, with estimator 5 with the optimal asymptotic variance performing best in these samples of size n = 1,000.

Figure 1

How does the effect of 1 year of treatment depend on its initiation time? Comparison of root mean squared errors. rMSE of estimators 2–5 of the simulation study. For example, for month 11, the quantity in Figure 1 is the root MSE for the estimator of the expected difference in the CD4 count at month 23 = 11 + 12 between starting ART at month 11 versus never starting ART. (a) Two parameter models and (b) three parameter models.

10 The effect of ART in HIV-infected patients in acute and early HIV infection

We estimate how the effect on immune reconstitution of initiating 1 year of ART depends on the time between the estimated date of HIV infection and ART initiation, in the early stages of HIV infection. The effect on immune reconstitution of 1 year of ART initiated m months after infection, is measured as the CD4 count at month m + 12 with ART initiated at month m versus the CD4 count at month m + 12 without ART. Of particular interest is the effect of 1 year of ART initiated at month m , given the past covariate history l ¯ m ; that is, γ m + 12 m ( l ¯ m ) .

We estimate the effect of ART γ using the observational AIEDRP (Acute Infection and Early Disease Research Program) Core01 data, with 1,762 HIV-infected patients diagnosed during acute and early infection [25]. Informed consent has been obtained from all individuals included in the AIEDRP study, and the author had no access to identifiable data.

In this HIV application, K + 1 is 24 months. 0 is the estimated date of HIV infection, although visits may be missed during follow-up and particularly in the earliest months of infection. To account for missed visits, we include the visit pattern (in which months a visit took place) as a measured covariate. Y , A , and L are measured at times that vary across patients, so first we create a monthly dataset. For L , we use the average measurement in each month; if there is no visit in month m , L m is coded as missing, a possible covariate value. A m cannot be missing because we assume that treatment can only start at visits and at visits A m is always recorded. We interpolate missing outcomes Y k , after the first visit and until censoring, except for visits just prior to ART onset, for which we carry the last observation forward, to avoid using post-ART information to impute pre-ART outcomes.

We assume:

Assumption 10.1

(Parameterization of coarse SNMM). For k = ( m + 1 ) ∨ 12 , … , ( m + 12 ) ∧ ( K + 1 ) , suppose that

γ k , ψ m ( l ¯ m ) = ( ψ * 1 + ψ * 2 m + ψ * 3 m 2 ) ( k − m ) 1 { k > m } ,

with ( k − m ) the treatment duration from month m to month k .

Since interest in this HIV application lies in the effect of 1 year of ART, that is, γ m + 12 m ( l ¯ m ) , Assumption 10.1 suffices; specifying γ k m for k < 12 and k > m + 12 might lead to greater precision, but increases the risk of model misspecification. The restriction to these values of k implies that in no unmeasured confounding Assumption 5.1, k can be similarly restricted. For estimation, every sum over k is then also restricted to k = ( m + 1 ) ∨ 12 , … , ( m + 12 ) ∧ ( K + 1 ) .

Because ART is assumed to only change at visit months m , the nuisance models include 1 visit ( m ) , an indicator of whether a visit took place at month m . For loss to follow-up, we assume Missing At Random [45] and apply IPCW [44]. Web-appendix D provides details on the predictors included in the various nuisance parameter models in this HIV application.

Table 2 provides estimates and bootstrap 95 % CIs based on the AIEDRP data, for the same estimators as described in Section 9. Table 2 shows that importantly, all estimators based on the optimality theory of the current article lead to much narrower 95 % CIs than the estimators based on Theorem 6.4 with q → chosen in naive ways as described in Section 8.4. Both naive estimators lead to irrelevant estimators in the three-parameter model, due to extremely wide CIs. The same is true for the second naive estimator in the two-parameter model. Double robustness does not lead to narrower CIs in the AIEDRP data (compare estimators 2 and 3). Estimator 5, which would be optimal without censoring and under homoscedasticity Assumption 7.7, leads to much wider CIs than Estimator 4. For the AIEDRP data, Estimator 4, which is similar to the estimator with optimal precision but with working identity covariance matrices, leads to the narrowest, and quite informative, CIs.

Table 2

The AIEDRP data: Various estimators and their bootstrap 95% CIs

Model	ψ ˆ 1 (95% CI)	(width CI)	ψ ˆ 2 (95% CI)	(width CI)	ψ ˆ 3 (95% CI)	(width CI)
Two-parameter model
1a. q as in (12), not DR	22.4 (18.9,25.8)	(6.9)	0.16 ( ‒ 0.59 , 1.0 )	(1.58)	—
1b. q as in (13), not DR	43 ( ‒ 128 , 209 )	(338)	‒ 9 ( ‒ 89 , 68 )	(157)	—
2. restricted approx. optimal, not DR	22.3 (19.1,25.3)	(6.1)	0.18 ( ‒ 0.53 , 1.00 )	(1.53)	—
3. restricted approx. optimal, DR	24.1 (20.8,27.2)	(6.4)	‒ 0.72 ( ‒ 1.39 , ‒ 0.02 )	(1.37)	—
4. working identity covariances, DR	25.3 (22.2,28.5)	(6.3)	‒ 0.23 ( ‒ 0.71 , 0.28 )	(0.99)	—
5. approximately optimal, DR	24.8 (20.2,29.0)	(8.7)	‒ 0.44 ( ‒ 2.1 , 1.3 )	(3.44)	—
2b. as 2., sensitivity analysis	24.0 (20.6,27.6)	(7.0)	‒ 0.24 ( ‒ 1.09 , 0.62 )	(1.71)	—
3b. as 3, sensitivity analysis	26.0 (22.7,29.3)	(6.6)	‒ 0.88 ( ‒ 1.62 , ‒ 0.17 )	(1.45)	—
4b. as 4, sensitivity analysis	25.7 (22.4,29.0)	(6.5)	‒ 0.33 ( ‒ 0.83 , 0.18 )	(1.01)	—
5b. as 5, sensitivity analysis	25.4 (20.1,30.0)	(9.9)	‒ 0.60 ( ‒ 2.2 , 1.3 )	(3.48)	—
Three-parameter model
1a. q as in (12), not DR	39 ( ‒ 27 , 175 )	(203)	‒ 11 ( ‒ 99 , 33 )	(131)	1.2 ( ‒ 4.1 , 12 )	(16.2)
1b. q as in (13), not DR	38 ( ‒ 126 , 208 )	(334)	‒ 11 ( ‒ 132 , 107 )	(239)	1.6 ( ‒ 15 , 19 )	(34)
2. restricted approx. optimal, not DR	19.5 (14.8, 24.1)	(9.3)	2.0 ( ‒ 0.3 , 4.4 )	(4.7)	‒ 0.20 ( ‒ 0.43 , 0.02 )	(0.45)
3. restricted approx. optimal, DR	23.4 (18.5,28.0)	(9.5)	‒ 0.2 ( ‒ 2.6 , 2.3 )	(4.9)	‒ 0.06 ( ‒ 0.32 , 0.19 )	(0.51)
4. working identity covariances, DR	25.6 (21.7, 29.5)	(7.8)	‒ 0.35 ( ‒ 1.8 , 1.2 )	(3.0)	0.0095 ( ‒ 0.09 , 0.11 )	(0.20)
5. approximately optimal, DR	25.9 (19.0, 31.8)	(12.8)	‒ 0.87 ( ‒ 3.8 , 2.7 )	(6.5)	0.025 ( ‒ 0.21 , 0.23 )	(0.44)
2b. as 2., sensitivity analysis	24.8 (19.7,31.0)	(11.3)	‒ 0.7 ( ‒ 4.0 , 1.9 )	(5.8)	0.06 ( ‒ 0.23 , 0.51 )	(0.74)
3b. as 3., sensitivity analysis	25.9 (21.2,30.6)	(9.5)	‒ 0.8 ( ‒ 2.9 , 1.3 )	(4.3)	‒ 0.004 ( ‒ 0.22 , 0.21 )	(0.43)
4b. as 4, sensitivity analysis	27.3 (23.2,31.4)	(8.2)	‒ 1.1 ( ‒ 2.5 , 0.4 )	(2.9)	0.06 ( ‒ 0.04 , 0.15 )	(0.19)
5b. as 5, sensitivity analysis	30.4 (20.7,33.6)	(12.8)	‒ 3.2 ( ‒ 4.4 , 1.6 )	(5.9)	0.22 ( ‒ 0.13 , 0.30 )	(0.43)

DR: doubly robust.

95% CI: 95% CI based on bootstrap, Efron’s percentile method.

1a and 1b: Naive choices of estimators within class (Section 7.4), not doubly robust.

2: q approximately optimal within the class with q k m = 0 for k ≠ m + 12 (Section 7.2), not doubly robust.

3: q approximately optimal within the class with q k m = 0 for k ≠ m + 12 (Section 7.2), doubly robust.

4: Like the optimal estimator, but with working identity covariance matrices (Sections 6 and 7.3), doubly robust.

5: Optimal under correct specification of all models (Sections 6 and 7.3), doubly robust.

Figure 2 compares the performance of our estimators of the effect of ART on the 1-year CD4 count increase due to ART initiated at different months after the estimated date of HIV infection. For example, for month 11, the quantity in Figure 2 is the estimated expected difference in the CD4 count at month 23 = 11 + 12 , comparing initiating ART at month 11 versus never initiating ART. To facilitate the comparison of the other estimators, Figure 2 does not include the naive estimators of Section 8.4, which perform much worse. As in Table 2, Estimator 4., with a working identity covariance matrix, leads to the best precision.

Table 2 also describes the results of a sensitivity analysis. In this sensitivity analysis, treatment initiation and dropout are modeled using model selection techniques (details in Web-Appendix D). While model selection in principle invalidates the CIs, this sensitivity analysis indicates that the results are somewhat sensitive to model specification, but the clinical implications are the same as those of the main analysis.

Figure 2

How does the effect of 1 year of ART depend on its initiation time? Estimates of the effect of 1 year of ART based on the AIEDRP data. Estimators 2–5 applied to the AIEDRP data (with 95% CIs). For example, for month 11, the quantity in the figure is the estimate the expected difference in the CD4 count at month 23 = 11 + 12 between starting ART at month 11 versus never starting ART. (a and b) Two parameter model and (c) three parameter model.

The estimated effect on the CD4 count of 1 year of ART initiated in acute and early HIV infection is substantial and significant. The effect decreases somewhat with increasing time between HIV infection and ART initiation, but this trend is not significant. The estimated effect of 1 year of ART initiated at the time of HIV infection is 304 (95% CI (266, 341)), and if initiated 12 months after HIV infection, the effect is 270 (95% CI (220,323)). The difference between the effect of 1 year of ART initiated 12 months after versus at infection is − 34 (95% CI ( − 102 , 40)) (based on the two-parameter model). On the scale of the CD4 counts, this includes clinically relevant differences, albeit more indicating an increased rather than a reduced effect of initiating ART earlier. These results thus contradict claims that ART might not be substantially effective in acute and early HIV infection.

The analysis in this HIV example has limitations. The results are obtained by analyzing the observational AIEDRP Core01 data. As in traditional SNMMs, we assume that all confounders are measured. This assumption cannot be tested using the available data. The author collaborated closely with Susan Little and Davey Smith, infectious diseases MDs at UCSD, and Victor DeGruttola, ScD, at Harvard, an expert in the statistics of HIV, to decide which covariates to include in the analyses: the variables that are predictive of both ART initiation and the outcome. Moreover, we assume that the treatment effect model of Assumption 10.1 is correctly specified. Yang and Lok [46] tested Assumption 10.1 and concluded that the AIEDRP data do not provide evidence that Assumption 10.1 is violated; however, this may be due to a limited sample size. In addition, one or more of the nuisance parameter models may be misspecified. That being said, only the treatment initiation model and the outcome regression model affect consistency and asymptotic normality, and double robustness implies that misspecification of only one of these two models preserves consistency and asymptotic normality.

Summarizing this HIV example: with the increased precision of our estimators, we conclude that ART is substantially effective when initiated in early and acute HIV infection. This contradict claims that ART might not be substantially effective in acute and early HIV infection. Current efforts to diagnose HIV early can be expected to be effective also in acute and early HIV infection, if diagnosis is combined with immediate ART initiation as currently recommended.

11 Discussion

Both in the simulation study and in the motivating HIV application, most of the naive coarse SNMM estimators lead to useless inference. This can realistically happen in practice if the nuisance function q → is chosen without knowledge of optimality theory, and finding this in practice motivated our development of estimators with optimal precision. In the HIV application, the naive estimators are so imprecise that it is important to not even use a naive estimator in the first, preliminary, step for an estimator with better precision.

Using optimal estimators substantially improves the precision of coarse SNMMs. In the simulation study, the optimal doubly robust estimator results in useful inference and performs best. In the HIV application, our methods also substantially improve precision. In the HIV application, the best performance is by a doubly robust estimator related to the optimal estimator, but using working identity covariance matrices, similar to the identity working covariance matrices approach in GEEs (e.g., [42] Section 4.6).

The suboptimal behavior of the optimal estimator as implemented in the HIV application may have several causes. It could be due to a combination of limited sample size and censoring. It could also be due to model (mis)specification of the nuisance parameter models. In particular, it is likely that in the HIV application, cov → m [ H I 3 ∣ L ¯ m , A ¯ m − 1 = 0 ] (equation (8)) depends on the covariates L ¯ m . It is to be expected that the conditional (co)variances Γ k , s m = cov [ H ( k ) , H ( s ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] will increase as k , s > m increase, because when k , s > m increase L ¯ m will capture less of the variation in the H ’s conditional on L ¯ m . Similar to model (2), one could consider models for this conditional variance that include terms like α ( min ( k , s ) − m ) CD 4 m ∕ ( 1 − ∣ k − s ∣ ) . Regressing the preliminary estimator H ˆ ( k ) H ˆ ( s ) − E ˆ [ H ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] ⋅ E ˆ [ H ( s ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] on k and s (both categorical), their interactions, and CD4sqrtpred = ( min ( k , s ) − m ) ⋅ CD 4 m ∕ ( 1 − ∣ k − s ∣ ) led to the following fit for m = 11 (SAS PROC GENMOD with a linear link, independence working correlation, and sandwich estimator-based standard errors, output limited to the effect of CD4sqrtpred):

Parameter	Estimate	SE	95% CI	Z	Pr > ∣ Z ∣
CD4sqrtpred	311	80	(154,468)	3.89	0.0001.

These models for different m also suggest that CD 4 m predicts the conditional covariance.

It is also possible that by only assuming that cov → m [ H I 3 ∣ L ¯ m , A ¯ m − 1 = 0 ] does not depend on L ¯ m but not posing additional restrictions, cov → m [ H I 3 ∣ L ¯ m , A ¯ m − 1 = 0 ] is not adequately restricted to be estimated with sufficient precision given the sample size of n = 1762 , which includes censored observations and confounding by indication. Much of the literature on GEEs [47] restricts to very or fairly simple correlation structures at least in their simulations and practical examples. For example, for GEE1, [48], who mention the one-parameter exchangeable working model, the one-parameter first-order autoregressive model, and the one-parameter moving average model, and [47] Examples 3 and 4 consider one-parameter correlation matrices. Prentice and Zhao [49] also suggest simple correlation matrices when interest lies in means (as opposed to covariances). In their data application, Prentice and Zhao [49] did not find much benefit of exploring correlation matrices beyond working independence variance matrices for the model of the means. Wang and Carey [48] and for some settings, Liang and Zeger [47] report benefits from using more realistic one-parameter correlation matrices, although only in simulations. For GEE2, Liang et al. [50] use two-parameter and one-parameter correlation matrices in their eye and chronic obstructive pulmonary disease applications, respectively. Zhao and Prentice [51] found moderate differences in the estimates for the model of the means in their data application; given the estimated standard errors, these differences might be due to random variation. Considering this existing literature on GEEs, it is unclear whether regressing preliminary estimates of H on untreated time-dependent covariates without including considerable restrictions will have enough precision to improve the behavior of the optimal estimator for reasonable sample sizes.

One could hope that simplifying assumptions on the working covariance matrix might improve efficiency. Similar to model (2), one could consider models for this conditional variance like ( min ( k , s ) − m ) ( α 0 + α 1 CD 4 m ) ∕ ( 1 − ∣ k − s ∣ ) . This can be implemented by regressing

H ˆ ( k ) H ˆ ( s ) − E ˆ [ H ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] ⋅ E ˆ [ H ( s ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ]

on ( min ( k , s ) − m ) ∕ ( 1 − ∣ k − s ∣ ) and ( min ( k , s ) − m ) ⋅ CD 4 m ∕ ( 1 − ∣ k − s ∣ ) without an intercept, using the preliminary estimator for H and its regression E ˆ [ H ( k ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] from Section 8.3. The resulting point estimates could be used to populate cov ^ [ H ( k ) , H ( s ) ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] and thus Γ ˆ k , s m . In the HIV application, the first 200 bootstrap samples implementing this estimator, with the aforementioned regressions estimating ( α 0 , α 1 ) for each m separately, suggested a rather large variance: ψ ˆ 1 = 26.1 , 95%-CI ( − 38 , 70 ) , ψ ˆ 2 = − 1.18 , 95%-CI ( − 9.4 , 18.0 ), with CIs based on Efron’s percentile method (the bootstrap standard error-based CIs were even wider). A more flexible model for this covariance will likely be better programmed in R than in SAS, given that R works better with matrices and it would likely involve taking the inverse of patient-specific matrices cov → m [ H I 3 ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] . Modeling Γ k , s m to increase precision constitutes an interesting topic for future research. Misspecifying Γ k , s m affects precision, not double robustness.

Efficient estimation under heteroscedasticity is an interesting topic for future research.

In practice, if there is no reasonable subject-matter-informed way to specify a covariance matrix that depends on covariates, it may be more feasible to work with a working covariance matrix that does not depend on covariates. Given the theory on GEEs, our simulations, and the results in our HIV application, the working identity matrix could be used for the primary analysis, and the optimal estimator assuming that the matrix cov → m [ H I 8 ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] does not depend on the covariates for a sensitivity analysis. The optimal estimator assuming that cov → m [ H I 8 ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] does not depend on the covariates is not much harder to calculate than the estimator with working identity matrices, so it seems worth reporting both estimators with their CIs. With very large sample sizes, it might be feasible to empirically maximize efficiency.

An interesting topic for future research is investigating the properties of the proposed estimators when the treatment effect model is misspecified. Yang and Lok [46] developed goodness-of-fit tests for time-dependent coarse structural nested mean models and applied these to our HIV application by investigating violations of the model of Assumption 10.1; they found no significant deviations from this model. However, nonparametric estimation of the treatment effect γ would also be interesting, as are nonlinear treatment effect models. Another promising topic for future research is developing target minimum loss-based estimation of coarse SNMMs. Moreover, our estimators do not have optimal precision in the presence of censoring. They do remain doubly robust if censoring is missing at random (MAR [45]) and the censoring model is correctly specified. Optimal estimation of coarse SNMMs in the presence of censoring that is MAR constitutes an interesting topic for future research. It might lead to multiple robustness, as in for example [52] for average treatment effects estimated using instrumental variables or [53] for mediation analysis. The HIV application includes censoring, and our methods still lead to remarkable reductions of the variance.

Yang and Lok [54] considered sensitivity analyses to violations to the assumption of no unmeasured confounding for time-dependent coarse structural nested mean models, and applied these to our HIV application.

Because the nuisance parameter models for treatment initiation, censoring, H , and the treatment effect model of interest consider different aspects of the data generating mechanism, they are variation independent, and it does not seem like the coarse SNMM parameterization is likely to be uncongenial as defined in e.g. [55]. The model for the outcome Y used for the preliminary estimator is not variation independent of these models. In addition, there is always the likely risk of model misspecification for any of these models, and especially for the models for the conditional expectations of H and Y . It is more likely that the model for treatment initiation is correctly specified than the model for the conditional expectation of H , with the physician–researchers involved in the specification of the treatment initiation model also involved in the treatment decisions themselves. Because of double robustness, if the model for treatment initiation is correctly specified, the model for the conditional expectation of H (and hence Y ) does not not need to be correctly specified for consistency and asymptotic normality. Improved efficiency can be expected if all models are close to being correctly specified.

In the HIV application, the improved precision of coarse SNMMs has implications for clinical practice. Currently, more HIV-infected patients are diagnosed in acute and early infection, due to increased HIV testing [13,14], including increased partner testing [14]. With increased HIV testing leading to earlier HIV diagnoses, more treatment decisions need to be made in early and acute infection. We estimate that the effect on the CD4 count of 1 year of ART initiated at the time of HIV infection is 304 (95% CI (266, 341)), and if initiated 12 months after HIV infection the effect is 270 (95% CI (220, 323)). Thus, our results indicate that also in acute and early HIV infection, ART initiation at the time of diagnosis substantially benefits patients. This result is also important for partners of HIV-infected patients because ART reduces the risk the risk of HIV transmission [22–24].

For a sufficiently smooth doubly robust estimator, if both models are correctly specified and each of these models is estimated at rate n 1 ∕ 4 or faster, the sandwich estimator of the variance ignoring estimation of the nuisance parameters is consistent, because neither model is needed separately to identify ψ . This has been shown by various authors; Lok [56] provides a basic derivation of this result. This opens the door to fitting more flexible nuisance parameter models.

The main text focuses on coarse SNMMs with a time-varying outcome; Web-appendix B shows how the calculations simplify with an outcome measured at the end of the study.

Currently, none of the theory presented in this article is routinely used when SNMMs are applied. That means, also the estimator with a working identity matrix instead of cov → m [ H I 8 ∣ L ¯ m , A ¯ m − 1 = 0 ¯ ] constitutes a novel contribution. Both in the simulations and in the HIV application, this “estimator 4” leads to substantially increased precision compared to naive choices of the estimating equations.

Thus, we summarize that precision of estimators for coarse SNMMs depends substantially on the estimating equations chosen. The substantial improvement we find by choosing estimating equations leading to optimal precision creates a good starting point for a comparison of precision between MSMs and coarse SNMMs for settings where the treatment effect only depends on baseline covariates (in those settings, MSMs and SNMMs estimate the same quantities; it has been conjectured [6] that SNMMs have better precision). The substantial improvements in precision also suggest that the use of optimal estimators will encourage more widespread use of coarse SNMMs.

Acknowledgements

The author is grateful to the patients who volunteered for AIEDRP, to the AIEDRP study team, and to Susan Little, Davey Smith, and Christy Anderson for their help and advice in interpreting the AIEDRP data. The author thanks Ray Griner for extensive help with the SAS programming, and Shu Yang for programming the estimators in R. The author thanks James Robins and Victor DeGruttola for insightful and fruitful discussions.

Funding information: This work was supported by the Milton Fund, the Career Incubator Fund from the Harvard School of Public Health, and the National Institutes of Health [NIAID R01 AI100762 to Lok, AI074621 and AI43638 (AIEDRP)]. The content is solely the author’s responsibility and does not represent the official views of the NIH.
Author contributions: The author confirms the sole responsibility for the conception of the study, presented results and manuscript preparation.
Conflict of interest: The author states no conflicts of interest.
Data availability statement: The AIEDRP Core01 data and the SAS code are available upon request to the author.

References

[1] Lok JJ, DeGruttola V. Impact of time to start treatment following infection with application to initiating HAART in HIV-positive patients. Biometrics. 2012;68:745–54. 10.1111/j.1541-0420.2011.01738.xSearch in Google Scholar

[2] Robins JM. Correction for non-compliance in equivalence trials. Stat Med. 1998;17:269–302. 10.1002/(SICI)1097-0258(19980215)17:3<269::AID-SIM763>3.0.CO;2-JSearch in Google Scholar

[3] Robins JM, Blevins D, Ritter G, Wulfsohn M. G-estimation of the effect of prophylaxis therapy for pneumocystis carinii pneumonia on the survival of AIDS patients. Epidemiology. 1992;3(4):319–36. 10.1097/00001648-199207000-00007Search in Google Scholar

[4] Robins JM. Correcting for non-compliance in randomized trials using structural nested mean models. Commun Stat. 1994;23:2379–412. 10.1080/03610929408831393Search in Google Scholar

[5] Lok JJ, Gill RD, der Vaart AWV, Robins JM. Estimating the causal effect of a time-varying treatment on time-to-event using structural nested failure time models. Stat Neerlandica. 2004;58(3):271–95. 10.1111/j.1467-9574.2004.00123.xSearch in Google Scholar

[6] Robins JM. Marginal structural models versus structural nested models as tools for causal inference. In: Halloran ME, Berry D, editors. Statistical models in epidemiology: The environment and clinical trials. vol. 116. New York: Springer-Verlag; 2000. p. 95–133. 10.1007/978-1-4612-1284-3_2Search in Google Scholar

[7] Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183(8):758–64. 10.1093/aje/kwv254Search in Google Scholar

[8] Hernán MA, Wang W, Leaf DE. Target trial emulation: a framework for causal inference from observational data. Jama. 2022;328(24):2446–7. 10.1001/jama.2022.21383Search in Google Scholar

[9] Robins JM, Hernán MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11(5):550–60. 10.1097/00001648-200009000-00011Search in Google Scholar

[10] Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11(5):561–70. 10.1097/00001648-200009000-00012Search in Google Scholar

[11] Imbens GW. Nonparametric estimation of average treatment effects under exogeneity: a review. Rev Econom Stat. 2004;86(1):4–29. 10.1162/003465304323023651Search in Google Scholar

[12] Van Der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. New York: Springer Verlag; 2003. 10.1007/978-0-387-21700-0Search in Google Scholar

[13] DiNenno EA, Prejean J, Irwin K, Delaney KP, Bowles K, Martin T, et al. Recommendations for HIV screening of gay, bisexual, and other men who have sex with men – United States 2017. MMWR Morb Mortal Wkly Rep. 2017;66:830–2. 10.15585/mmwr.mm6631a3External. Search in Google Scholar

[14] World Health Organization. Consolidated guidelines on HIV, viral hepatitis and STI prevention, diagnosis, treatment and care for key populations. World Health Organization: Geneva; 2022. Licence: CC BY-NC-SA 3.0 IGO. Search in Google Scholar

[15] The INSIGHT START Study Group. Initiation of antiretroviral therapy in early asymptomatic HIV infection. N Engl J Med. 2015;373(9):795–807. 10.1056/NEJMoa1506816Search in Google Scholar

[16] Hammer SM, Saag MS, Schechter M, Montaner JS, Schooley RT, Jacobsen DM, et al. Treatment for adult HIV infection: 2006 recommendations of the International AIDS Society-USA panel. J Am Med Assoc. 2006;296(7):827–43. 10.1001/jama.296.7.827Search in Google Scholar

[17] WHO Treatment Guidelines. Antiretroviral therapy for HIV infection in adults and adolescents. 2006. With addendum, http://www.who.int/hiv/pub/guidelines/artadultguidelines.pdf. Search in Google Scholar

[18] Panel on Antiretroviral Guidelines for Adults and Adolescents. Guidelines for the use of antiretroviral agents in adults and adolescents with HIV. Department of Health and Human Services; 2023. https://clinicalinfo.hiv.gov/en/guidelines/adult-and-adolescent-arv. Accessed 6/10/2023. Search in Google Scholar

[19] UNAIDS. 90-90-90, an ambitious treatment target to help end the AIDS epidemic. 2014. http://www.unaids.org/sites/default/files/media_asset/90-90-90_en_0.pdf. Search in Google Scholar

[20] UNAIDS. 90-90-90: treatment for all. 2020. https://www.unaids.org/en/resources/presscentre/featurestories/2020/september/20200921_90-90-90. Search in Google Scholar

[21] U S Department of State. About us - PEPFAR; 2023. https://www.state.gov/pepfar/. Search in Google Scholar

[22] Cohen MS, Chen YQ, McCauley M, Gamble T, Hosseinipour MC, Kumarasamy N, et al. Prevention of HIV-1 infection with early antiretroviral therapy. N Engl J Med. 2011;365:493–505. 10.1056/NEJMoa1105243Search in Google Scholar

[23] Granich R, Gilks C, Dye C, Cock KD, Williams B. Universal voluntary HIV testing with immediate antiretroviral therapy as a strategy for elimination of HIV transmission: a mathematical model. Lancet. 2009;373(9657):48–57. 10.1016/S0140-6736(08)61697-9Search in Google Scholar

[24] DeGruttola V, Little S, Schooley R. Controlling the HIV epidemic, without a vaccine! AIDS. 2008;22:2554–5. 10.1097/QAD.0b013e32831940d3Search in Google Scholar

[25] Hecht FM, Wang L, Collier A, Little S, Markowitz M, Margolick J, et al. A multicenter observational study of the potential benefits of initiating combination antiretroviral therapy during acute HIV infection. J Infect Disease. 2006;194:725–33. 10.1086/506616Search in Google Scholar

[26] Smith DM, Strain MC, Frost SDW, Pillai SK, Wong JK, Wrin T, et al. Lack of neutralizing antibody response to HIV-1 predisposes to superinfection. Virology. 2006;355:1–5. 10.1016/j.virol.2006.08.009Search in Google Scholar

[27] Rubin DB. Bayesian inference for causal effects: the role of randomization. Ann Stat. 1978;6:34–58. 10.1214/aos/1176344064Search in Google Scholar

[28] Robins JM. Structural nested failure time models. In: Armitage P, Colton T, editors. Survival analysis. vol. 6 of Encyclopedia of Biostatistics. Chichester, UK: John Wiley and Sons; 1998. p. 4372–89. Section Eds: P.K. Andersen and N. Keiding. Search in Google Scholar

[29] Lok JJ. Mimicking counterfactual outcomes to estimate causal effects. Ann Stat. 2017;45(2):461–99. 10.1214/15-AOS1433Search in Google Scholar

[30] Lok JJ. How estimating nuisance parameters can reduce the variance (with consistent variance estimation). Stat Med. 2024;43(23):4456–80.10.1002/sim.10164Search in Google Scholar

[31] Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. 10.1093/biomet/70.1.41Search in Google Scholar

[32] Dawid AP. Conditional independence in statistical theory (with discussion). J R Stat Soc. 1979;B41:1–31. 10.1111/j.2517-6161.1979.tb01052.xSearch in Google Scholar

[33] Lok JJ. Statistical modelling of causal effects in continuous time. Ann Stat. 2008;36(3):1464–507. ArXiv: math.ST/0410271 at http://arXiv.org. 10.1214/009053607000000820Search in Google Scholar

[34] Lok JJ. Structural nested models and standard software: a mathematical foundation through partial likelihood. Scand J Stat. 2007;34(1):186–206. 10.1111/j.1467-9469.2006.00539.xSearch in Google Scholar

[35] Van der Vaart AW. Asymptotic statistics. Cambridge series in statistical and probabilistic mathematics. Cambridge: Cambridge University Press; 1998. Search in Google Scholar

[36] Rotnitzky A, Robins JM. Semiparametric regression estimation in the presence of dependent censoring. Biometrika. 1995;82(4):805–20. 10.1093/biomet/82.4.805Search in Google Scholar

[37] Robins JM. Optimal structural nested models for optimal sequential decisions. In: Lin DY, Heagerty P, editors. Proceedings of the Second Seattle Symposium on Biostatistics. New York: Springer; 2004. 10.1007/978-1-4419-9076-1_11Search in Google Scholar

[38] Newey WK, McFadden D. Large sample estimation and hypothesis testing. In: Engle RF, McFadden DL, editors. Handbook of econometrics. vol. 4. Amsterdam, The Netherlands: Elsevier; 1994. p. 2111–245. Edition 1, chapter 36. 10.1016/S1573-4412(05)80005-4Search in Google Scholar

[39] Newey WK. Efficient estimation of models with conditional moment restrictions. In: Maddala GS, Rao CR, Vinod HD, editors. Handbook of statistics. vol. 11. Amsterdam, The Netherlands: Elsevier Science Publishers B.V.; 1993. p. 419–54. 10.1016/S0169-7161(05)80051-3Search in Google Scholar

[40] Newey WK. Efficient instrumental variables estimation of nonlinear models. Econometrica. 1990;58(4):809–37. 10.2307/2938351Search in Google Scholar

[41] Hahn J. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica. 1998;66(2):315–31. 10.2307/2998560Search in Google Scholar

[42] Tsiatis AA. Semiparametric theory and missing data. New York: Springer; 2006. Search in Google Scholar

[43] Zeger SL, Liang K, Albert PS. Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988;44(4):1049–60. 10.2307/2531734Search in Google Scholar

[44] Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Amer Stat Assoc. 1995;90(429):106–21. 10.1080/01621459.1995.10476493Search in Google Scholar

[45] Rubin DB. Inference and missing data. Biometrika. 1976;63:581–92. 10.1093/biomet/63.3.581Search in Google Scholar

[46] Yang S, Lok JJ. A goodness-of-fit test for structural nested mean models. Biometrika. 2016;103(3):734–41. 10.1093/biomet/asw031Search in Google Scholar

[47] Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22. 10.1093/biomet/73.1.13Search in Google Scholar

[48] Wang YG, Carey V. Working correlation structure misspecification, estimation and covariate design: implications for generalised estimating equations performance. Biometrika. 2003;90(1):29–41. 10.1093/biomet/90.1.29Search in Google Scholar

[49] Prentice RL, Zhao LP. Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics. 1991;47(3):825–39. 10.2307/2532642Search in Google Scholar

[50] Liang KY, Zeger SL, Qaqish B. Multivariate regression analyses for categorical data. J R Stat Soc Ser B (Methodological). 1992;54(1):3–24. 10.1111/j.2517-6161.1992.tb01862.xSearch in Google Scholar

[51] Zhao LP, Prentice RL. Correlated binary regression using a quadratic exponential model. Biometrika. 1990;77(3):642–8. 10.1093/biomet/77.3.642Search in Google Scholar

[52] Wang L, Tchetgen Tchetgen E. Bounded efficient and multiply robust estimation of average treatment effects using instrumental variables. J R Stat Soc Ser B Stat Methodol. 2018;80(3):531–50. 10.1111/rssb.12262Search in Google Scholar

[53] Tchetgen Tchetgen EJ, Shpitser I. Semiparametric theory for causal mediation analysis: efficiency bounds, multiple robustness, and sensitivity analysis. Ann Stat. 2012;40(3):1816. 10.1214/12-AOS990Search in Google Scholar

[54] Yang S, Lok JJ. Sensitivity analysis for unmeasured confounding in coarse structural nested mean models. Stat Sin. 2018;28(4):1703.10.5705/ss.202016.0133Search in Google Scholar

[55] Meng XL. Multiple-imputation inferences with uncongenial sources of input. Stat Sci. 1994;9(4):538–58. 10.1214/ss/1177010269Search in Google Scholar

[56] Lok JJ. Demystified: double robustness with nuisance parameters estimated at rate n-to-the-1/4. arxiv: http://arxiv.org/abs/2409.02320; 2024. Search in Google Scholar

Received: 2024-02-05

Revised: 2024-09-18

Accepted: 2024-10-03

Published Online: 2025-02-11

This work is licensed under the Creative Commons Attribution 4.0 International License.

Supplementary material

Articles in the same Issue

https://doi.org/10.1515/jci-2023-0078

Keywords for this article

doubly robust estimation; early HIV detection; HIV/AIDS; structural nested models

Creative Commons

BY 4.0

Optimal precision of coarse structural nested mean models to estimate the effect of initiating ART in early and acute HIV infection

Article

Abstract

1 Introduction

2 Motivating data

3 Setting and notation

4 Time-dependent coarse SNMMs

Definition 4.1

Assumption 4.2

Example 4.3

Assumption 4.4

5 No unmeasured confounding and consistency

Assumption 5.1

Assumption 5.2

6 Estimation: unbiased estimating equations

Definition 6.1

Example 6.2

Theorem 6.3

Theorem 6.4

7 Estimating equations leading to optimal precision

7.1 Assumptions and restrictions on the estimating equations

7.2 Doubly robust estimators lead to increased precision

Definition 7.1

Theorem 7.2

Theorem 7.3

Theorem 7.4

Theorem 7.5

7.3 A theorem that guarantees optimal precision

Theorem 7.6

7.4 Explicit expression for estimating equations leading to optimal precision for coarse SNMMs with a time-varying outcome

Assumption 7.7

Theorem 7.8

Theorem 7.9

Theorem 7.10

Theorem 7.11

8 Implementing these methods to estimate the effect of ART in early and acute HIV infection: estimation steps

8.1 Implementation: general remarks

8.2 The preliminary estimator ψ ˜

8.3 The optimal estimator

8.4 For comparison, a naive choice

9 Simulations

10 The effect of ART in HIV-infected patients in acute and early HIV infection

Assumption 10.1

11 Discussion

Acknowledgements

References

Supplementary Material

Articles in the same Issue

Articles in the same Issue

Articles in the same Issue