Robust variance estimation and inference for causal effect estimation

Linh Tran; Maya Petersen; Joshua Schwab; Mark J. van der Laan

doi:10.1515/jci-2021-0067

Enjoy 40% off

academic books on De Gruyter Brill *

Article Open Access

Robust variance estimation and inference for causal effect estimation

Linh Tran , Maya Petersen , Joshua Schwab and Mark J. van der Laan

Published/Copyright: August 11, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Causal Inference Volume 11 Issue 1

Abstract

We present two novel approaches to variance estimation of semi-parametric efficient point estimators of the treatment-specific mean: (i) a robust approach that directly targets the variance of the influence function (IF) as a counterfactual mean outcome and (ii) a modified non-parametric bootstrap-based approach. The performance of these approaches to variance estimation is compared to variance estimation based on the sample variance of the empirical IF in simulations across different levels of positivity violations and treatment effect sizes. In this article, we focus on estimation of the nuisance parameters using correctly specified parametric models for the treatment mechanism in order to highlight the challenges posed by violation of positivity assumptions (distinct from the challenges posed by non-parametric estimation of the nuisance parameters). Results demonstrate that (1) variance estimation based on the empirical IF may provide highly anti-conservative confidence interval coverage (as reported previously), (2) the proposed robust approach to variance estimation in this setting provides conservative coverage, and (3) the proposed modified bootstrap maintains close to nominal coverage and improves power. In the appendix, we (a) generalize the robust approach of estimating variance to marginal structural working models and (b) provide a proof of the consistency of the targeted minimum loss-based estimation bootstrap.

Keywords: estimator variance; influence function; targeted minimum loss-based estimation; asymptotic efficiency; non-parametric bootstrap; positivity assumption; augmented inverse probability-weighted estimation

MSC 2010: 62D20

1 Introduction

A number of estimators are available for the treatment-specific mean outcome parameter (and the corresponding causal contrasts) based on longitudinal data structures, such as inverse probability weighting (IPW) [1,2], double robust augmented IPW (AIPW) [3–8], and targeted minimum loss-based estimation (TMLE) [9]. Variance estimation for each of these estimators is conventionally achieved by using their corresponding influence functions (IFs) [10] based on the empirical distribution or by resampling methods such as the non-parametric bootstrap [11]. However, a number of shortcomings exist with these variance estimation approaches. In particular, the non-parametric bootstrap has been lacking in theory to support its validity and may be computationally prohibitive when machine learning methods are used for the estimation of nuisance parameters. Furthermore, both IF-based and bootstrap-based confidence intervals can become anti-conservative in the setting of positivity violations (i.e., when support for a treatment regime of interest is minimal for some levels of covariate history [12]). For example, van der Laan and Gruber [9] found IF-based variance estimates for the intervention-specific mean outcome that were anti-conservative when compared with the Monte Carlo variance of the TMLE, leading to poor confidence interval coverage. Petersen et. al. [12] also found poor coverage for IF-based confidence intervals, owing to both practical positivity violations and relatively rare outcomes. Importantly, poor performance of standard variance estimators can occur in finite samples, even when the assumptions for asymptotic validity of these estimators hold [12]. Furthermore, these standard approaches to variance estimation are not sensitive to theoretical violations of the positivity assumptions under which the asymptotic variance of the estimator would be infinity, i.e., when the positivity assumption needed for identification fails due to lack of support in the underlying data-generating process. Improved approaches to variance estimation in the context of positivity violations are thus needed that are able to (1) serve as a “red flag” to alert the analyst if the data at hand provide insufficient information to estimate the desired causal parameter with any reasonable degree of accuracy, and (2) provide closer to nominal confidence interval coverage and type 1 error control in these challenging estimation settings.

Previous studies [12,13] proposed estimating the asymptotic variance of the estimator with a parametric bootstrap based on a fit of the density of the data-generating distribution, involving estimation of individual factors of the likelihood. This proposal corresponds with evaluation of the variance of a given estimator using the data at hand as a given data-generating experiment. The consistency of this estimator relies on a consistent estimation of the corresponding factors of the likelihood. This parametric bootstrap integrates over sparse events and therefore will explode the variance. An extremely large sample is therefore needed to get the true variance under this Monte Carlo scheme. As a consequence, this parametric bootstrap-based variance estimate was only proposed as a measure to raise a red flag for unreliable statistical inference due to poor data support. In addition, in the context of sparsity in order to obtain a valid estimate of the variance, one needs to (i) sample a large number of bootstrap samples and (ii) refit the likelihood in each iteration in order to capture the rare observations that, nonetheless, heavily contribute to the variance. Thus, the bootstrap method is extremely computer-intensive, making this Monte Carlo scheme an intractable method for complex estimators, particularly those that rely on machine learning for nuisance parameter estimation.

In this article, we present two approaches to variance estimation in the context of positivity violations to address these challenges. We use analytic expressions to compute the variance of the efficient influence function (EIF) for the statistical target parameter corresponding (under assumptions) to the counterfactual mean outcome under a longitudinal dynamic treatment regime, thereby providing the asymptotic variance of estimators solving the estimating equation corresponding to this function. These analytic expressions naturally integrate over the rare observations and thereby avoid the finite sample bias in variance estimation using standard influence curve or non-parametric bootstrap-based methods due to the rare aforementioned observations. With this, we construct robust plug-in-type estimators of these asymptotic variances that are consistent if both the treatment mechanism and treatment-specific means of specified outcomes are consistently estimated (as our derived expression requires both). These estimators require estimation of the treatment mechanism and several treatment-specific means of specified outcomes (defined as a function of the observed data structure, indexed by the estimator of the treatment mechanism), which can be estimated with either an estimating equation-type IPW estimator or an efficient substitution-based method such as a TMLE [9]. The resulting variance estimator, unlike current alternatives based on taking the variance of the empirical IF, or using a non-parametric bootstrap, will become very large whenever the estimated treatment mechanism reflects practical or theoretical violations of the positivity assumption.

While this newly presented approach performs well in estimating the asymptotic variance of estimators solving the estimating equation corresponding to the EIF, a lower finite sample variance should be expected for substitution-based estimators such as TMLE [9], due to the guaranteed parameter boundaries provided by the estimator. We therefore additionally present a second bootstrap-based approach of estimating the finite sample variance. This bootstrap improves on the estimator based on the empirical variance of the variance of the IF by making use of rare observations in the update step of the estimator. Importantly, it does not require re-estimation of the individual factors of the likelihood and therefore reduces the computational burden compared to both the standard parametric and non-parametric bootstrap methods. The resulting reduction in the computational load (compared to a fully non-parametric bootstrap approach, which refits the likelihood for each iteration) allows for a more tractable approach at estimating the variance. Furthermore, the modified bootstrap is asymptotically consistent under reasonable assumptions, namely, the same essential assumptions needed for the estimator of the target parameter itself to be asymptotically linear. In other words, the bootstrap we propose is valid whenever the original TMLE is asymptotically linear. A non-parametric full bootstrap can easily break down due to the machine learning algorithms behaving differently under sampling from P n than under P 0 , where P n is our empirical distribution and P 0 is the true distribution. Thus, while the non-parametric bootstrap is not consistent for these efficient estimators using machine learning, our proposed bootstrap is consistent.

1.1 Organization of this article

In Section 2, we formally define the observable data, likelihood, and statistical model for its distribution. Our target parameter of the treatment-specific mean outcome is defined, along with its EIF. We briefly review the causal model and identification assumptions under which this statistical parameter of the observed data distribution corresponds with the desired causal parameter of the counterfactual distribution, along with the currently common approach of IF-based estimator variance estimation.

Section 3 presents an approach for robust estimation of the variance of the EIF under sparsity. The expression for the variance of the EIF is presented along with both IPW and TMLE-based approaches for estimating this parameter. To illustrate, an example is given for a point treatment setting under a static treatment regime. Advantages of this approach, implemented in the ltmle R package [12,14] but not previously described in the literature, are covered. Appendix A generalizes the approach to working marginal structural working models and provides proofs.

Section 4 discusses a second approach of estimating the estimator variance using the bootstrap under a modified TMLE. Appendix B proves the consistency of this bootstrap estimator.

Section 5 illustrates the performance of the variance estimators presented in Sections 3 and 4 by applying them in simulations to both a single time point and longitudinal treatment settings to estimate the variance of an (iterated conditional expectation) TMLE point estimator [9,15] under varying effect sizes and degrees of positivity violation. We focus on the setting in which the treatment mechanism is estimated according to a correctly specified parametric model in order to distinguish challenges to inference due to positivity from the distinct challenges to variance estimation and inference due to potential slow convergence of machine learning-based nuisance parameter estimators. We discuss extensions to the setting of machine learning-based nuisance parameter estimation in the discussion. Results show that, in settings of substantial positivity violation, the standard empirical IF-based approach results in anti-conservative confidence interval coverage. In contrast, the robust approach provides conservative coverage, and thus an effective diagnostic for settings in which standard approaches may result in misleading inference. Finally, the proposed bootstrap approach provides the closest to nominal coverage of the three estimators, and maintains higher power than the robust approach.

We conclude with a discussion in Section 6, which reviews the results, benefits of this new approach, potential limitations, and future directions.

2 The causal roadmap: the statistical estimation problem and causal identification

Consider a longitudinal study in which subjects are seen at each time point t from t = 0 , 1 , … , K + 1 . The observable data structure on a randomly sampled subject is

(1) O = ( L ( 0 ) , A ( 0 ) , L ( 1 ) , A ( 1 ) , … , A ( K ) , Y = L ( K + 1 ) ) ∼ i i d P 0 ,

where L ( 0 ) includes all baseline covariates, A ( t ) denotes an intervention (or treatment) node at time t , and L ( t ) denotes all time-varying covariates at time point t , measured between the intervention nodes A ( t − ) and A ( t ) , where for notational convenience, we define t − ≡ t − 1 . Our outcome of interest Y = L ( K + 1 ) is an outcome measured after the final intervention A ( K ) . We observe n independent and identically distributed (iid) copies O i : i = 1 , … , n , of O .

The likelihood L ( O ) for the observable data is the product of conditional probabilities such that the likelihood for subject i is

(2) L ( O i ) = p 0 ( L i ( 0 ) , A i ( 0 ) , L i ( 1 ) , A i ( 1 ) , … , L i ( K + 1 ) ) = p 0 ( L i ( K + 1 ) ∣ L ¯ i ( K ) , A ¯ i ( K ) ) ⋅ p 0 ( A i ( K ) ∣ L ¯ i ( K ) , A ¯ i ( K − 1 ) ) ⋅ p 0 ( L i ( K ) ∣ L ¯ i ( K − 1 ) , A ¯ i ( K − 1 ) ) ⋅ p 0 ( A i ( K − 1 ) ∣ L ¯ i ( K − 1 ) , A ¯ i ( K − 2 ) ) ⋯ p 0 ( L i ( 0 ) ) = ∏ t = 0 K + 1 p 0 ( L i ( t ) ∣ L ¯ i ( t − ) , A ¯ i ( t − ) ) ︸ q 0 , t ( L ( t ) ) ∣ P a ( L ( t ) ) ⋅ ∏ t = 0 K p 0 ( A i ( t ) ∣ L ¯ i ( t ) , A ¯ i ( t − ) ) ︸ g 0 , t ( A ( t ) ) ∣ P a ( A ( t ) ) ,

where X ¯ ( t ) ≡ ( X ( 1 ) , X ( 2 ) , … , X ( t ) ) , A ( − 1 ) = L ( − 1 ) = ∅ , and p 0 ( o ) denotes p 0 ( O = o ) under the true distribution P 0 where we assume O is discrete for the sake of presentation.

The statistical model ℳ for the data involves assumptions, if any, only on the conditional distributions of A ( t ) , given P a ( A ( t ) ) = ( L ¯ ( t ) , A ¯ ( t − ) ) , t = 0 , … , K . Let

P 0 d ( l ) ≡ ∏ t = 0 K + 1 P 0 , L ( t ) ( l ( t ) ) ∣ l ¯ ( t − ) , d ( l ¯ ( t − ) ) ,

denote the G -computation formula for the post-intervention distribution of an intervention that sets A ¯ ( K ) = d ( l ¯ ( K ) ) [16]. We use the notation P L ( t ) for a conditional distribution of L ( t ) , given P a ( L ( t ) ) = ( L ¯ ( t − ) , A ¯ ( t − ) ) . Let L d = ( L ( 0 ) , … , Y d = L d ( K + 1 ) ) be a random variable under the post-intervention distribution P 0 d . The statistical target estimand is defined here as Ψ ( P 0 ) = E P 0 d [ Y d ] , i.e., the mean of the outcome at time K + 1 under this distribution. We note that Ψ : ℳ → R represents a target parameter mapping on the statistical model to the real line. Defining t + ≡ t + 1 , the EIF of Ψ at P is given by previous studies [6,15,17]

D ∗ ( P ) ( O ) = ∑ t = 0 K + 1 D t ∗ ( P ) ( O ) ,

where

D 0 ∗ ( P ) ( L ( 0 ) ) = Q ¯ 1 d − Q ¯ 0 d D t ∗ ( P ) ( A ¯ ( t − ) , L ¯ ( t − ) ) = H t ( g ) ( Q ¯ t + d − Q ¯ t d ) : t = 1 , 2 , … , K + 1 ,

where

(3) H t ( g ) = I ( A ¯ ( t − ) = d ( l ¯ ( t − ) ) ) g 0 : t − ( A ¯ ( t − ) , L ¯ ( t − ) ) , Q ¯ K + 2 d = Y , Q ¯ t d = E P [ Y d ∣ L ¯ d ( t − ) = L ¯ ( t − ) ] : t = 1 , 2 , … , K + 1 , Q ¯ 0 d = E P [ Y d ] .

Here, g 0 : t − ( A ¯ ( t − ) , L ¯ ( t − ) ) denotes the cumulative probability of treatment up to time t − 1 , i.e.,

g 0 : t − ( A ¯ ( t − ) , L ¯ ( t − ) ) ≡ ∏ u = 0 t − 1 p ( A ( u ) ∣ L ¯ ( u ) , A ¯ ( u − 1 ) ) .

Furthermore, Q ¯ t d is defined by recursive regression, starting at t = K + 1 and moving backward in time. For notational convenience, we let H 0 = 1 so that

D ∗ ( P ) ( O ) = ∑ t = 0 K + 1 H t ( g ) ( Q ¯ t + d − Q ¯ t d ) .

2.1 Causal model

Under additional assumptions about the data-generating process, the target statistical estimand Ψ ( P 0 ) is equal to the mean of the counterfactual outcome Y d under an intervention to set the vector of treatment nodes to value d ( l ¯ ( K ) ) (i.e., the counterfactual outcome under a specified dynamic regime). Specifically, for interventions of interest d ∈ D , we assume sequential randomization [16]

Y d ⊥ ⊥ A ( t ) ∣ L ¯ ( t ) , A ¯ ( t − ) : t = 0 , 1 , … , K

and positivity [18]

(4) P ( A ¯ ( t ) = d ( L ¯ ( t ) ) ∣ L ¯ ( t ) , A ¯ ( t − ) = d ( L ¯ ( t ) ) ) > 0 a.e. : t = 0 , 1 , … , K .

Regarding the assumption of positivity, we note that as P ( A ¯ ( t ) = d ( L ¯ ( t ) ) ∣ L ¯ ( t ) , A ¯ ( t − ) = d ( L ¯ ( t ) ) ) → 0 , we have that H t ( g ) → ∞ resulting in var [ D ∗ ( P ) ( O ) ] → ∞ ; following the literature, we refer to violation of the positivity assumption (4) as a theoretical positivity violation and near violations resulting in lack of adequate support in finite samples as practical positivity violations.

2.2 Review of IF-based variance

Recall that an estimator Ψ ˆ ( P n ) is considered to be asymptotically linear if and only if

Ψ ˆ ( P n ) − Ψ ( P 0 ) = 1 n ∑ i = 1 n D ( P 0 ) ( O i ) + o p ( n − 1 / 2 )

for some mean 0 finite variance IF D ( P 0 ) ( O ) [10]. If an estimator is asymptotically linear, then it will be asymptotically normal with variance equal to the variance of the IF over n . The asymptotic variance of the estimator can therefore be consistently estimated with the variance of the empirical IF D ( P n ) ( O ) , i.e., var ˆ [ Ψ ˆ ( P n ) ] = var [ D ( P n ) ( O ) ] / n , which implies an asymptotically valid confidence interval (assuming the theoretical positivity assumption (4)). However, as we discuss in the following sections, this approach has substantial limitations both as a diagnostic that the theoretical positivity assumption fails, and for finite sample inference in the context of practical positivity violations.

2.2.1 TMLE

One possible estimator that solves the estimating equation corresponding to the EIF for intervention-specific mean outcomes is TMLE [19]. Naive plug-in estimators can be too biased with a dominating term being the empirical mean of the EIF, and solving the estimating equation can reduce this bias.

This plug-in estimator works by forming an initial fit of the q 0 factors of the likelihood and subsequently perturbing it such that the estimating equation corresponding to the EIF is solved. We assume the use of the iterated conditional outcome TMLE [9,15] but modified to improve robustness to data sparsity by incorporation of the inverse probability of treatment as a weight rather than a covariate in the targeted update [20–22]. We restrict ourselves to this estimator for our estimation problem, such that our attention is focused on estimation of the estimator’s variance. Note that our proposed variance estimators also apply to estimating equation approaches, such as the double robust AIPW [3–8].

3 Semi-targeted estimation of the EIF variance

A common approach to variance estimation and corresponding inference based on this point estimator is thus based on taking the sample variance of the empirical EIF divided by sample size, using either the initial or the targeted empirical estimates of the q 0 factors of the likelihood.

The basis of the proposed “robust” approach to variance estimation is to express the variance of D ( P 0 ) ( O ) as an expectation, allowing us to estimate the variance as a mean and the target estimation of the variance directly using TMLE. The following describes how to obtain a TMLE of the variance of each component of the EIF σ t 2 in the setting of a scalar parameter. We provide a proof for the more general working MSM setting in Appendix A for the interested reader. Both robust estimators are implemented as options in the ltmle R package.

3.1 Expression for variance of the EIF for E Y d

Under regimens d ( l ¯ ( K ) ) , we have

σ 0 2 ≡ E 0 [ D ∗ ( P 0 ) ( O ) ] 2 = ∑ t = 0 K + 1 E 0 [ H t 2 ( g 0 ) ( Q ¯ 0 , t + d − Q ¯ 0 , t d ) 2 ] ,

where g 0 represents the true cumulative probability of treatment up to time t − 1 , i.e., g 0 : t − under P 0 .

Using the expression for H t ( g ) from equation (3), and first taking the conditional expectation w.r.t. A ¯ ( t − ) given X = ( L ¯ d : d ) , it follows that this can be written as:

(5) σ 0 2 = ∑ t = 0 K + 1 E P 0 d ( Q ¯ 0 , t + d − Q ¯ 0 , t d ) 2 ( L ¯ d ( t ) ) g 0 ( d ( l ¯ ( t − ) ) , L ¯ d ( t − ) ) ,

where we define g 0 ( d ( l ¯ ( − 1 ) ) , L ¯ d ( − 1 ) ) = 1 so that the term at t = 0 equals E L ( 0 ) [ Q ¯ 0 , 1 d ( L ( 0 ) ) − E 0 Y d ] 2 . This is simply a sum of expectations over t ∈ { 0 , 1 , … , K + 1 } . For notational convenience, we re-write equation (5) as

(6) σ 0 2 = ∑ t = 0 K + 1 σ t 2 , d = ∑ t = 0 K + 1 E P 0 d [ S t d ( Q ¯ 0 , g 0 ) ( L ¯ d ( t ) ) ] ,

for the specified function

S t d ( Q ¯ 0 , g 0 ) ( L ¯ d ( t ) ) ≡ ( Q ¯ 0 , t + d − Q ¯ 0 , t d ) 2 ( L ¯ d ( t ) ) g 0 ( d ( l ¯ ( t − ) ) , L ¯ d ( t − ) ) : t = 0 , 1 , … , K + 1 .

Note that given ( Q ¯ 0 , g 0 ), we have that E P 0 d [ S t d ( Q ¯ 0 , g 0 ) ] is the mean of a counterfactual S t d ( Q ¯ 0 , g 0 ) ( L ¯ d ( t ) ) , i.e., the mean of a real-valued function (indexed by d ( l ¯ ) itself) of L ¯ d ( t ) , which needs to be estimated based on the longitudinal data structure L ( 0 ) , A ( 0 ) , … , A ( t − 1 ) , L ( t ) . Given Q ¯ 0 , g 0 , we observe the outcome S t d ( Q ¯ 0 , g 0 ) ( L ¯ i ( t ) ) , i = 1 , 2 , … , n , so that we can represent the observed data structure as L ( 0 ) , A ( 0 ) , … , A ( t − 1 ) , S t d ( Q ¯ 0 , g 0 ) ( L ¯ ( t ) ) , and we wish to estimate the statistical target parameter

(7) E P 0 d [ S t d ( Q ¯ 0 , g 0 ) ] = ∑ l ¯ ( t ) S t d ( Q ¯ 0 , g 0 ) ( l ¯ ( t ) ) P 0 d ( L ¯ d ( t ) = l ¯ ( t ) ) : t = 0 , 1 , … , K + 1 ,

where again we assume l ( t ) is discrete for the sake of presentation.

3.1.1 Estimation of variance of the EIF

With the expression for the variance of the EIF in hand (equation (6)), we can now form estimators that target this parameter. Q ¯ 0 and g 0 are not known in practice, though estimates Q ¯ n ∗ and g n will be readily available if estimating E [ Y d ] using a double robust estimator such as TMLE, thus providing us with the observed outcome S t d ( Q ¯ n ∗ , g n ) ( L ¯ ( t ) ) . Treating this variable as our new time point-specific outcome, our goal is to estimate the mean of this variable over the post-intervention distribution of L ¯ d ( t ) . For notational convenience, let Z d ( t ) ≡ S t d ( Q ¯ 0 , g 0 ) ( L ¯ ( t ) ) represent the observable outcome and ( L ( 0 ) , A ( 0 ) , … , A ( t − 1 ) , Z d ( t ) ) represent the observed data structure.

One possible approach to estimating each of the components (equation (7)) is to use a simple IPW estimator [1]

σ ˆ t , n , IPW 2 , d = 1 n ∑ i = 1 n I ( A ¯ i ( t − ) = d ( l ¯ ( t − ) ) ) g 0 : t − , n ( d ( l ¯ ( t − ) ) , L ¯ i ( t − ) ) Z n d ( t ) ,

where Z n d ( t ) = S t d ( Q ¯ n , g n ) ( L ¯ ( t ) ) . However, such an estimator would still be subject to underestimation of the variance by ignoring the contribution of observations that selected a likely treatment A ¯ i , even though their probability of following d ( l ¯ ) is very small. In other words, subjects i with small probabilities of following d ( l ¯ ) would be unlikely to be observed with A ¯ i = d ( l ¯ ) resulting in an indicator value of 0 for the numerator and, consequently, a contribution of 0 to the IPW estimator. Therefore, we stress that it is important to use a plug-in estimator such as TMLE [9] to estimate this parameter. A plug-in estimator will integrate over all l ¯ ( t ) in the support of P t , n d and thus contribute many large values of S t , n d ( Q ¯ n ∗ , g n ) when there are practical or theoretical positivity assumption violations. In addition, the TMLE is a double robust estimator (for the point estimate) so that it will yield a consistent estimator of this variance if g n is consistent for the true g 0 or Q ¯ n is consistent for Q ¯ 0 .

Given Q ¯ 0 and g 0 , we will now provide a succinct summary of the TMLE of σ 0 , t 2 , d = E P 0 d [ Z d ( t ) ] that is based on the iterative sequential regression approach. Note that this iterative sequential regression approach is similar to the one presented by van der Laan and Gruber [9] for the intervention-specific mean outcome parameter. Denote the counterfactual of Z d ( t ) under treatment d ′ with Z d , d ′ ( t ) . Recall that we observe S t d ( Q ¯ 0 , g 0 ) ( L ¯ ( t ) ) , whereas we wish to estimate S t d ( Q ¯ 0 , g 0 ) ( L ¯ d ′ ( t ) ) (n.b. here, we focus only on when d ′ = d ). Let P 0 d ′ be the G -computation formula [16] corresponding with this intervention A ¯ ( t − ) = d ′ ( l ¯ ( t − ) ) . We wish to estimate σ 0 , t 2 , d = E P 0 d [ Z d , d ′ ( t ) ] , which can be represented as a series of iterated conditional expectations

σ 0 , t 2 , d = E [ E [ ⋯ E [ E [ Z d ( t ) ∣ L ¯ d ( t − 1 ) ] ∣ L ¯ d ( t − 2 ) ] ⋯ ∣ L ¯ d ( 0 ) ] ] .

The EIF for this target parameter σ t 2 , d is given by:

D σ t 2 , d ∗ ( P ) ( O ) = ∑ m = 0 t H m d , t ( g ) ( Q ¯ m + d , σ t 2 − Q ¯ m d , σ t 2 ) ,

where we define

Q ¯ t + 1 d , σ t 2 = Z d ( t ) Q ¯ m d , σ t 2 = E P [ Z d ( t ) ∣ L ¯ d ( m ) = L ¯ ( m ) ] : m = 1 , 2 , … , t H m d , t ( g ) = I ( A ¯ ( m − ) = d ( l ¯ ( m − ) ) ) g 0 : m − ( d ( l ¯ ( m − ) ) L ¯ ( m − ) ) : m = 1 , 2 , … , t H 0 d , t = 1 .

Therefore, the EIF for σ 2 = ∑ t σ t 2 , d is simply D σ 2 ∗ = ∑ t D σ t 2 , d ∗ .

With the EIF established, the TMLE of σ t 2 , d (in similar fashion to van der Laan and Gruber [9]) is now defined as follows.

Estimates g 0 : m − , n : m = 1 , 2 , … , t are readily available if estimating E [ Y d ] using an estimator, which solves the estimating equation corresponding to the EIF such as TMLE.
Set Q ¯ t , n d , σ t 2 = Z i d ( t ) . Determine the range ( a , b ) for Z i d ( t ) , i = 1 , … , n and target this initial fit using a parametric submodel respecting this range ( a , b ) by adding the clever covariate H t d , t (on, say, the logistic scale), using the initial fit Q ¯ t , n d , σ t 2 as off-set. The resulting updated fit is denoted with Q ¯ t , n d , σ t 2 , ∗ .
Given Q ¯ t , n d , σ t 2 , ∗ , we can recursively for m = t − 1 , t − 2 , … , 1 :
1. Regress the targeted fit Q ¯ m + , n d , σ t 2 , ∗ onto A ¯ ( m − ) = d ( l ¯ ( m − ) ) , L ¯ ( m − ) , using logistic regression to respect the range ( a , b ) . Denote the fit Q ¯ m , n d , σ t 2 .
2. Target this initial fit respecting the range ( a , b ) with clever covariate I ( A ¯ ( m − ) = d ( l ¯ ( m − ) ) ) and observational weight 1 g 0 : m − ( d ( l ¯ ( m − ) ) , L ¯ ( m − ) ) (on the logistic scale), and denote this targeted fit of Q ¯ m d , σ t 2 with Q ¯ m , n d , σ t 2 , ∗ . Note that we already have g 0 : m − ( d ( l ¯ ( m − ) ) , L ¯ ( m − ) ) readily available from Step 1. Furthermore, we have the ability to use different submodels (e.g., using a linear submodel that incorporates the propensity scores into the clever covariate, rather than as weights). We defer discussing this until Section 5.2.1.
At m = 1 , we have the estimate Q ¯ 1 , n d , σ t 2 , ∗ , which now is a function of only L ( 0 ) . Finally, we take the average of Q ¯ 1 , n d , σ t 2 , ∗ w.r.t. the empirical distribution of L i ( 0 ) , i.e., Q ¯ 0 , n d , σ t 2 , ∗ = 1 n ∑ i = 1 n Q ¯ 1 , n d , σ t 2 , ∗ ( L i ( 0 ) ) . The resulting σ ˆ t , n , T M L E 2 , d = Q ¯ 0 , n d , σ t 2 , ∗ is the desired TMLE of σ t 2 , d .

We refer any readers unfamiliar with TMLE to Gruber and van der Laan [23] for further details.

3.1.2 Application to single time point treatment setting

For the sake of illustration, let us consider the method presented earlier for the estimation of the variance of the EIF for the case that O = ( L ( 0 ) , A ( 0 ) , Y = L ( 1 ) ) and the target parameter is E [ Y a ] for a static point treatment a .

In this case, the variance of the EIF is represented as:

(8) σ 0 2 = E 0 [ D ∗ ( P 0 ) ( O ) ] 2 = E 0 I ( A = a ) g 0 ( a ∣ L ( 0 ) ) ( Y − Q ¯ 0 a ( L ( 0 ) ) ) + Q ¯ 0 a ( L ( 0 ) ) − E [ Y a ] 2 = E 0 I ( A = a ) g 0 ( a ∣ L ( 0 ) ) ( Y − Q ¯ 0 a ( L ( 0 ) ) ) 2 + E 0 [ Q ¯ 0 a ( L ( 0 ) ) − E [ Y a ] ] 2 = E P 0 a ( Y a − Q ¯ 0 a ( L ( 0 ) ) ) 2 g 0 ( a ∣ L ( 0 ) ) + E 0 [ Q ¯ 0 a ( L ( 0 ) ) − E [ Y a ] ] 2 .

If using an estimator that solves the estimating equation corresponding to the EIF for the estimation of E [ Y a ] such as TMLE, we are provided with estimators g n and Q ¯ n ∗ of g 0 ( A ∣ L ( 0 ) ) and Q ¯ 0 a ( L ( 0 ) ) = E [ Y a ∣ L ( 0 ) ] = E 0 [ Y ∣ A = a , L ( 0 ) ] , respectively. The second term in the final expression of equation (8) is easily estimated with the empirical distribution. Given g 0 and Q ¯ 0 , the first term can be represented as the mean of a counterfactual S a ( L a ( 0 ) ) ≡ ( Y a − Q ¯ 0 a ( L ( 0 ) ) ) 2 / g 0 ( a ∣ L ( 0 ) ) , which needs to be estimated based on ( L ( 0 ) , A , S a ( L ( 0 ) , Y ) ) , where S a ( L ( 0 ) , Y ) = ( Y − Q ¯ 0 ( a , L ( 0 ) ) ) 2 / g 0 ( a ∣ L ( 0 ) ) represents the observed outcome. For example, we can use a TMLE estimator E n ∗ [ S a ( L ( 0 ) , Y a ) ] of E 0 [ S a ( L ( 0 ) , Y a ) ] = E L ( 0 ) , 0 [ E 0 [ S a ∣ A = a , L ( 0 ) ] ] . The TMLE estimate E n ∗ [ S a ∣ A = a , L ( 0 ) ] of E 0 [ S a ∣ A = a , L ( 0 ) ] is defined by determining the range ( a , b ) of S a ( L i ( 0 ) , Y i ) , obtaining an initial regression fit of E 0 [ S a ∣ L ( 0 ) , A ] that respects this range, representing it as a logistic regression fit bounded by ( a , b ) , and updating the latter by fitting a univariate logistic regression with clever covariate I ( A = a ) and observational weight 1 / g 0 ( a ∣ L ( 0 ) ) , using the initial fit as an off-set. Regarding the initial fit E n [ S a ∣ A = a , L ( 0 ) ] , recall from the aforementioned fact that S a is a function of L ( 0 ) , which results in the initial fit being exactly ( Y − Q ¯ 0 ( a , L ( 0 ) ) ) 2 / g 0 ( a ∣ L ( 0 ) ) . We can therefore use ( Y − Q ¯ n ( a , L ( 0 ) ) ) 2 / g n ( a ∣ L ( 0 ) ) , thus not requiring that we perform additional regression. Following the update step, the TMLE of E 0 [ S a ( L ( 0 ) , Y a ) ] is now given by 1 n ∑ i = 1 n E n ∗ [ S a ∣ L i ( 0 ) , A = a ] = 1 n ∑ i = 1 n ( Y i − Q ¯ n ∗ ( a , L i ( 0 ) ) ) 2 / g n ( a ∣ L i ( 0 ) ) , so that

σ ˆ n 2 , ∗ = 1 n ∑ i = 1 n ( Y i − Q ¯ n ∗ ( a , L i ( 0 ) ) ) 2 g n ( a ∣ L i ( 0 ) ) + 1 n ∑ i = 1 n ( Q ¯ n ∗ ( L i ( 0 ) , a ) − ψ ˆ n ∗ ) 2 ,

where ψ ˆ n ∗ is the targeted estimate of E [ Y a ] .

3.2 Advantages of this plug-in estimator of the asymptotic variance of the EIF

Since σ 0 2 / n equals the asymptotic variance of an asymptotically efficient estimator, it provides a good measure of the amount of information in the data for the target parameter of interest. Therefore, it is sensible to view σ 0 2 / n as a measure of sparsity for the target parameter of interest. If g n is a good estimator of g 0 , then our proposed plug-in estimator σ ˆ n 2 is much less subject to underestimation due to sparsity than currently available estimators such as the sample variance of the estimated IF, and the bootstrap-based estimate of the variance of an efficient estimator. This plug-in estimate σ ˆ n 2 represents a variance of the estimate of the EIF, which involves the integration of rare combinations of treatment and covariates that are unlikely to occur in the actual sample.

In particular, if there are theoretical violations of the positivity assumption, then this true variance σ 0 2 equals infinity, and if g n approximates g 0 well, then also the estimate σ ˆ n 2 will generate very large values, demonstrating the lack of identifiability and thereby raising a red flag for finite sample sparsity bias in the estimators (beyond the large confidence intervals generated by σ ˆ n 2 ).

We note that a disadvantage is that it is overly sensitivity to positivity violations by getting too large relative to real finite sample variance of the TMLE.

4 Variance estimation for substitution-based estimators

The plug-in estimator of the asymptotic variance of the EIF presented earlier is superior to the more common approach of taking the empirical EIF variance over the sample (i.e., var [ D ∗ ( P n ) ( O ) ] ), in that there is a much stronger contribution of combinations of treatment and covariates that are unlikely to occur in the actual sample. In finite samples, however substitution-based estimators such as TMLE (which are guaranteed to solve the EIF within a bounded range) are often observed to have smaller variance than their asymptotic variance. This is due to the mere fact that they are guaranteed to respect the global constraints of the statistical model and target parameter mapping. That is, as opposed to estimators defined as the solution to estimating equations, which tend to result in estimates outside parameter boundaries as the EIF variance increases, the use of substitution estimators in finite samples will often retain an estimator variance that is smaller than the EIF variance divided by the sample size, n . Thus, using the newly presented robust EIF variance method can result in overestimation of the estimator variance for these types of estimators. This therefore motivates us to develop an estimator of variance that is less conservative, i.e., more aligned to the true variance of substitution-based estimators (such as TMLE), particularly in settings of positivity violations.

One alternative approach for estimating the variance for substitution-based estimators is to conduct a non-parametric bootstrap. The n observations are sampled with replacement and used to form an estimate of the parameter over B iterations. However, as stated earlier, the non-parametric bootstrap is generally invalid and not theoretically supported. Additionally, this is a very computer-intensive method that usually requires estimating the full likelihood (i.e., P 0 d ) of the longitudinal data structure within each sampled iteration and is therefore normally infeasible in practice unless conducted within an a priori selected smaller parametric statistical model such as logistic regression.

In this section, we present an alternative bootstrap-based approach that, unlike the standard non-parametric bootstrap, is both computationally feasible and theoretically valid. That is, this bootstrap approach allows us to estimate the variance of the estimator while avoiding re-estimation of g 0 and Q ¯ 0 . To facilitate this, we propose a modification of the usual TMLE such that the targeting step is separated from the initial estimation of Q ¯ 0 . Recall that the typical TMLE, as implemented, pivots between the targeting step and the initial estimator for the next regression (preventing us from separating the initial fit from the targeting step). We propose a minor modification of the TMLE that separates these steps, first estimating all of the initial regressions and subsequently targeting the fits in a separate step. This modified TMLE can then be bootstrapped via only the targeting step. We provide a proof of its consistency in Appendix B. Note that, because the modified TMLE has the same asymptotic behavior as the original TMLE, the bootstrap is theoretically supported and will continue to have valid inference. To ensure that this does not result in anti-conservative behavior, we use a bootstrap with a TMLE update that is non-robust (i.e., by defining the clever covariate H t ( g ) with the denominator g n ), thereby resulting in large values for observations that are highly unlikely to follow the treatment regime of interest given their covariate history, even when they in fact fail to do so in the sample.

4.1 Modified TMLE for E [ Y d ]

To reduce the computational burden that bootstrapping requires, we first present the modified TMLE approach for generating a point estimate of the parameter E [ Y d ] . This parameter can be estimated by the following steps:

Estimate g 0 : t − ( A ¯ , L ¯ ) : t = 1 , 2 , … , K + 1 and denote the fits g 0 : t − , n .
Determine the range ( a , b ) for E [ Y d ] . Recursively for t = K + 1 , K , … , 1 , estimate the conditional expectation Q ¯ t d = E [ Q ¯ t + d ∣ L ¯ ( t − ) , A ¯ ( t − ) = d ( l ¯ ( t − ) ) ] respecting this range. Denote the fits Q ¯ t , n d . We stress that this step is crucially different than the typical TMLE, in that all of the initial regression fits are done simultaneously.
For time t = K + 1 , target the initial fit Q ¯ K + 1 , n d by using a parametric submodel respecting the range ( a , b ) by adding the covariate I ( A ¯ ( K ) = d ( l ¯ ( K ) ) ) and observational weight 1 / g 0 : K , n (on the logistic scale), using the initial fits as off-set, and setting Y as the dependent variable. Denote this updated fit as Q ¯ K + 1 , n d , ∗ .
Given Q ¯ K + 1 , n d , ∗ , we can recursively for t = K , K − 1 , … , 1 target the initial fits Q ¯ t , n d by using parametric submodels respecting the range ( a , b ) , adding the covariates I ( A ¯ ( t − ) = d ( l ¯ ( t − ) ) ) and observational weight 1 / g 0 : t − , n (on the logistic scale), using the initial fits as off-set, and setting Q ¯ t + , n d , ∗ as the dependent variable. Denote the updated fits as Q ¯ t , n d , ∗ .
At t = 1 , we have the estimate Q ¯ 1 , n d , ∗ , which now is a function of only L ( 0 ) . Taking the average of Q ¯ 1 , n d , ∗ w.r.t. the empirical distribution of L i ( 0 ) gives us the desired TMLE estimate of E [ Y d ] .

This estimator also solves the EIF and is therefore also asymptotically linear and efficient. We note that the analysis of this TMLE is identical to the typical TMLE presented by van der Laan and Gruber [9], with the only difference being the initial estimator fits. Here, the initial estimators are the original ones, whereas the previous TMLE is implemented with initial estimators using the targeted fits for the outcome.

We emphasize that this estimator is proposed for the sake of the bootstrap method for variance estimation. It is recursive, in that each fit, Q ¯ t , n d is dependent upon the fit at t + . As opposed to the TMLE, the recursive nature of this TMLE is self-contained within each step. In other words, each estimation step in this TMLE can be performed independently of the other steps. This allows the analyst to form all of the initial fits P n prior to performing any of the targeted updates.

4.2 Bootstrapping the modified TMLE

The new TMLE approach presented above can be bootstrapped in a fully non-parametric manner, such that observations are drawn with replacement prior to fitting the full-likelihood P 0 d and used to form an estimate of the parameter, leading to an estimate of estimator variance. Our recommendation is to only bootstrap the targeting step. More specifically, once the fits g 0 : t − , n and Q ¯ t , n d are formed for t = 1 , 2 , … , K + 1 , steps 3–5 are carried out in the bootstrap such that for b = 1 , 2 , … , B , we have

Q n , b ∗ = Q n ( ε b )

for a user-selected submodel P ( ε ) . The estimator variance is then estimated by taking the variance over the bootstrapped estimates, i.e., var ( Ψ ( Q n ) ˆ ) = var [ Ψ ( Q n , b ∗ ) ] .

We emphasize that this TMLE is provided such that we do not need to re-estimate Q ¯ n , g n . If g n → g 0 and Q ¯ n → Q ¯ 0 , then this TMLE is asymptotically linear with IF D ∗ ( q 0 , g 0 ) . This is conservative relative to the variance of the actual TMLE that is estimated with g n fitted on the data, when g n is consistent.

5 Simulations

Simulation studies presented in this section investigate the performance of the two proposed variance estimators: (1) the “robust” approach based on a TMLE of the variance itself (3) and (2) the computationally efficient bootstrap based on a modified TMLE (4). We further compare the performance of these estimators to the common approach to variance estimation based on the sample variance of the empirical EIF (using either the initial Q ¯ t d or targeted Q ¯ t , n d , ∗ estimators of the q 0 factors of the likelihood). We consider two data-generating processes and corresponding target causal parameters: a point treatment setting and a longitudinal observational study setting with three time points (i.e., K + 1 = 3 ) with time-dependent confounding. To evaluate the performance of the variance estimators, we first compare the mean of the variance estimates to the Monte Carlo variance of the point estimator; we also report the Monte Carlo variance of each variance estimator. Additionally, we present the 95% confidence interval coverage, Type I and Type II errors resulting from each variance estimation approach. All analyses were conducted on R version 3.1.1 [24]. Codes corresponding to the simulations have been uploaded to https://github.com/tranlm/tmleVariance.

5.1 Data-generating distribution P 0 and causal parameters

5.1.1 Point treatment setting

Consider a point treatment setting, such as patient enrollment into a care program, in which the treatment A ( 0 ) is only assigned at a single time point. We are interested in determining whether the treatment of interest has an effect on the (binary) outcome on an additive scale. Our target parameter is therefore the difference of the mean outcomes under treatment and control, i.e., ψ 0 , 1 ≡ E [ Y 1 − Y 0 ] . Under this setting, the simulated data were generated (such that we could observe the levels of positivity violations desired) as follows:

W 1 ∼ N ( 0 , 1 ) , bounded at [ − 2 , 2 ] , W 2 ∼ Ber ( logit − 1 ( − 1 ) ) , L 1 ( 0 ) ∼ N ( 0.1 + 0.4 W 1 , 0 . 5 2 ) , L 2 ( 0 ) ∼ N ( − 0.55 + 0.5 W 1 + 0.75 W 2 , 0 . 5 2 ) , g ¯ 0 , 0 ( Pa ( A ( 0 ) ) ) = logit − 1 ( β p − ( β p + 2.5 ) W 1 + 1.75 W 2 + ( β p + 3.2 ) L 1 ( 0 ) − 1.8 L 2 ( 0 ) + 0.8 L 1 ( 0 ) L 2 ( 0 ) ) , Q ¯ 0 , 1 ( Pa ( Y ) ) = logit − 1 ( − 0.5 + 1.2 W 1 − 2.4 W 2 − 1.8 L 1 ( 0 ) − 1.6 L 2 ( 0 ) + L 1 ( 0 ) L 2 ( 0 ) − β ψ 0 A ( 0 ) ) ,

with a positivity-associated parameter β p ranging from − 2 (minor positivity violations) to 0 (strong practical positivity violations). In estimation of g 0 : t , we impose a truncation of 0.001. For the treatment effect-associated parameter β ψ 0 , we consider values ranging from 0 (no treatment effect) to 1 (strong treatment effect). Here, L 1 ( 0 ) and L 2 ( 0 ) are not the time-dependent confounders and are therefore considered baseline covariates along with ( W 1 , W 2 ) , which affect both the treatment and the outcome.

Under these settings, the true parameter values ψ 0 were achieved by generating 1 0 8 observations under the counterfactual distribution for each β ψ 0 considered. Simulation results were obtained for 1,000 simulations of size n = 500 . Within each simulation, the bootstrap estimates of variance were formed from B = 1,000 iterations.

5.1.2 Longitudinal treatment setting

For the longitudinal setting, we considered a treatment A ( t ) and outcome L 3 ( t ) , which are each allowed to vary over time as a counting process (e.g., the treatment variable could be enrollment into a health program, while the outcome variable could be survival up to that time point). That is, if A ( t ) = 1 , then we have that A ̲ ( t ) = 1 , where A ̲ ( t ) = ( A ( t ) , A ( t + 1 ) , … , A ( K ) ) . Similarly, if L 3 ( t ) = 1 , then L 3 ̲ ( t ) = 1 .

We are (again) interested in whether the treatment of interest has an effect on the outcome at the final time point t ∗ = 3 on an additive scale. Thus, our target parameter is the difference of the mean outcomes under treatment and control at this final time point, i.e., ψ 0 , 3 = E [ Y 1 ( t ∗ ) − Y 0 ( t ∗ ) ] , where Y ( t ∗ ) = L 3 ( 3 ) . Under this setting, data for the first time point were generated in the same manner as the point treatment setting in Section 5.1.1. For the remaining two time points, the data were generated (again such that we could observe the levels of positivity violations desired) conditional on survival (i.e., L 3 ( t − ) = 0 ) as follows:

L 1 ( t ) ∼ N ( 0.1 + 0.4 W 1 , 0 . 5 2 + 0.6 L 1 ( t − ) − 0.7 L 2 ( t − ) + 0.45 β ψ 0 A ( t − ) ) , L 2 ( t ) ∼ N ( − 0.55 + 0.5 W 1 + 0.75 W 2 + 0.1 L 1 ( t − ) + 0.3 L 2 ( t − ) + 0.75 β ψ 0 A ( t − ) , 0 . 5 2 ) , g ¯ 0 , t ( Pa ( A ( t ) ) ) = logit − 1 ( β p − ( β p + 2.5 ) W 1 + 1.75 W 2 + ( β p + 3.2 ) L 1 ( t ) − 1.8 L 2 ( t ) + 0.8 L 1 ( t ) L 2 ( t ) ) , Q ¯ 0 , t ( Pa ( L 3 ( t ) ) ) = logit − 1 ( − 0.5 + 1.2 W 1 − 2.4 W 2 − 1.8 L 1 ( t − ) − 1.6 L 2 ( t − ) + L 1 ( t − ) L 2 ( t − ) − β ψ 0 A ( t − ) ) .

Similar to the point treatment setting, the treatment effect-associated parameter β ψ 0 also ranged from 0 to 1. We note, however, that the positivity issues faced in this scenario will be even more severe due to the higher number of combinations of treatment over time, which result in a larger number of truncations. Figure 1 shows the proportion of observations with truncated g 0 : 2 as a function of β p at a null effect, i.e., β ψ 0 = 0 .

$Figure 1 Proportion of observations with g 0 : 2 {g}_{0:2} truncated at each β p {\beta }_{p} .$

Figure 1

Proportion of observations with g 0 : 2 truncated at each β p .

5.2 Estimators

5.2.1 TMLE specification

Any submodel and loss function for which its loss-function-specific score

∂ ∂ ε L ( P ( ε ) ) ε = 0

spans D ∗ ( P 0 ) can be chosen in TMLE for both the estimation of the mean outcome E [ Y a ] and the variance of the EIF σ 2 . As the corresponding estimators solve the estimating equation corresponding to the EIF, they will all be asymptotically equivalent and thus all be asymptotically efficient. That is, no difference will be seen between TMLEs defined using alternative submodels as the sample size grows to infinity. In the TMLE presented by van der Laan and Gruber [9], these submodels are used in the targeting step for each Q ¯ t using a loss L ( Q ¯ t ) that is indexed by Q ¯ t + 1 . Specifically, for the targeting step, we need a loss and submodel with clever covariate such that the score given solves a desired component of the EIF D t ∗ ( P 0 ) .

In finite samples, however, TMLEs defined using various submodels can have varying performance. For example, under increasing levels of positivity violations, the use of linear submodels that use H t ( g ) as a covariate can have higher variance due to observations with low probabilities of treatment acting as high leverage points, which result in highly influential points for the estimation of the submodel parameter ε .

Recall that the catalyst for this work was the anti-conservative estimates of estimator variance resulting from the use of the empirical EIF variance. We therefore wish to establish a robust estimator of the variance of estimators that solve the EIF, particularly under violations or near violations of positivity. In other words, we desire a variance estimator that will asymptotically converge to the true variance of the estimator, but also simultaneously act on the conservative side in finite samples. Keeping this in mind, we used two submodel and loss function combinations for our simulations, in line with the recommendations above. For point estimation of the target parameter and the robust estimator of the EIF variance (3), we used submodels that define H t ( g ) and H m d , t ( g ) to be observational weights such that

logit Q ¯ ( ε ) = logit Q ¯ + ε I ( A ¯ ( t ) = a ¯ ( t ) ) ,

acknowledging our slight abuse of notation. Alternatively, in our bootstrap approach at estimating the TMLE variance, we define a clever covariate using H t ( g ) such that

logit Q ¯ ( ε ) = logit Q ¯ + ε H t ( g ) .

Both submodels use, as loss function, the negative log-likelihood loss.

5.2.2 Nuisance parameter estimators

To estimate our nuisance parameters Q ¯ 0 and g 0 in the point treatment setting, we fit linear models and estimate the coefficients using maximum-likelihood estimation. For example, we fit the treatment mechanism by using logistic regression with the following (correct) specification:

logit − 1 ( β 0 − β 1 W 1 + β 2 W 2 + β 3 L 1 ( 0 ) − β 4 L 2 ( 0 ) + β 5 L 1 ( 0 ) L 2 ( 0 ) ) .

For the longitudinal setting, we (mostly) followed the same approach as in the point treatment with two exceptions:

For estimating the treatment mechanism, we pool the data across the time points (after conditioning on not yet having received treatment). This provides greater data support for estimating the probability of treatment across all three time points, resulting in higher precision for the estimates of the coefficients.
We continue to use the same specification (across all time points) for estimating Q ¯ 0 (e.g., for time 2, we use β 0 + β 1 W 1 + β 2 W 2 + β 3 L 1 ( 2 ) + β 4 L 2 ( 2 ) + β 5 L 1 ( 2 ) ∗ L 2 ( 2 ) + β 6 A ( 2 ) ). While this is not the true model for Q ¯ 0 , we still have consistency in our estimate of ψ 0 , 1 due to our consistent estimation of g 0 .

Of note, in both the single time point and longitudinal settings, we use a correctly specified parametric model for the treatment mechanism. For the single time point setting, we use a correctly specified parametric model for the outcome regression, whereas in the longitudinal setting, we use a mis-specified parametric model. This allows us to investigate how our variance estimators perform as we deviate from correct specification of the outcome regressions. This is further motivated by the fact that the double robust estimators that we study remain asymptotically linear in this setting, while the variance estimator becomes conservative (because it targets an IF with higher variance).

5.3 Simulation results

5.3.1 Point treatment results

Figure 2 shows the Monte Carlo variance under no treatment effect ( β ψ 0 = 0 ) for the TMLE point estimator, along with the mean of the variance estimates from each estimation approach. At the lower end of β p , where positivity violations are minor, the observed estimator variance is low. For example, at β p = − 2 , the Monte Carlo variance was 8.068 × 1 0 − 4 . As β p increased, introducing higher levels of positivity violations, the estimator variance increases as expected.

$Figure 2 Mean of variance estimates under no treatment effect ( β ψ 0 = 0 {\beta }_{{\psi }_{0}}=0 ) at each positivity ( β p {\beta }_{p} ) value under the point treatment setting, overlaid with the estimator’s Monte Carlo variance.$

Figure 2

Mean of variance estimates under no treatment effect ( β ψ 0 = 0 ) at each positivity ( β p ) value under the point treatment setting, overlaid with the estimator’s Monte Carlo variance.

Regarding the mean of variance estimator (and thus its bias), all four approaches were similar at low values of β p . For example, at β p = − 2 , the mean of the estimates was 7.957 × 1 0 − 4 , 7.563 × 1 0 − 4 , and 8.545 × 1 0 − 4 for the empirical EIF ( Q ¯ t d ), robust, and bootstrapped-based approaches, respectively, compared to the estimator’s Monte Carlo variance of 7.977 × 1 0 − 4 . As β p increased, the empirical EIF approaches underestimated the variance on average (although to a lesser extent when the targeted, as compared to initial, estimate of the q 0 factors was used). In contrast, the bootstrap and robust EIF approaches resulted in slightly conservative estimates.

Figure 3 shows the Monte Carlo variance for each method used for estimating the variance. Lower values in this figure can be interpreted as coming from a variance estimator with higher precision. From it, we can see that the empirical EIF approach to estimating variance has the highest variance at severe levels of positivity violation. Additionally, the use of Q ¯ t d results in noticeably higher variance than Q ¯ t , n d , ∗ . Conversely, the robust approach of estimating variance maintains its low level of variability across all positivity settings. The bootstrap approach tended to result in variance that was in between, though converged closer to the empirical EIF approach at high levels of positivity violations.

$Figure 3 Monte Carlo variance of variance estimators under no treatment effect ( β ψ 0 = 0 {\beta }_{{\psi }_{0}}=0 ) at each positivity ( β p {\beta }_{p} ) value under the point treatment setting.$

Figure 3

Monte Carlo variance of variance estimators under no treatment effect ( β ψ 0 = 0 ) at each positivity ( β p ) value under the point treatment setting.

We evaluated 95 % confidence interval coverage for the TMLE estimator of E Y d under the four approaches to variance estimation. Figure 4 shows a heat map overlaid with a single contour line (at 0.95) of the resulting coverage estimates (i.e., the observed proportion of times the true parameters were captured by the confidence intervals) over the different combinations of β ψ 0 and β p . Additionally, we estimated the power to reject the null hypothesis (at a level of 0.05) corresponding to each variance estimation approach under the range of treatment effect sizes and degrees of positivity violation considered above. Figure 5 shows a heat map overlaid with a single contour line (at 0.05) of the resulting power estimates. Results at β ψ 0 = 0 can be interpreted as Type I errors, as they inform us of the times that the null hypothesis of no treatment effect is incorrectly rejected.

$Figure 4 Simulated 95% coverage for each variance estimation approach for the TMLE point estimator under various treatment ( β ψ 0 {\beta }_{{\psi }_{0}} ) and positivity ( β p {\beta }_{p} ) values under the point treatment setting. Contour line corresponds to 95%. (a) Empirical EIF Q, (b) empirical EIF Q∗, (c) robust EIF, and (d) bootstrap.$

Figure 4

Simulated 95% coverage for each variance estimation approach for the TMLE point estimator under various treatment ( β ψ 0 ) and positivity ( β p ) values under the point treatment setting. Contour line corresponds to 95%. (a) Empirical EIF Q, (b) empirical EIF Q∗, (c) robust EIF, and (d) bootstrap.

$Figure 5 Simulated power for each variance estimation approach for the TMLE point estimator under various treatment ( β ψ 0 {\beta }_{{\psi }_{0}} ) and positivity ( β p {\beta }_{p} ) values under the point treatment setting. Contour line corresponds to 5%. (a) Empirical EIF Q Q , (b) empirical EIF Q ∗ Q\ast , (c) robust EIF, and (d) bootstrap.$

Figure 5

Simulated power for each variance estimation approach for the TMLE point estimator under various treatment ( β ψ 0 ) and positivity ( β p ) values under the point treatment setting. Contour line corresponds to 5%. (a) Empirical EIF Q , (b) empirical EIF Q ∗ , (c) robust EIF, and (d) bootstrap.

The empirical EIF approaches consistently demonstrated a slight under-coverage of the true parameter values (even at low levels of positivity violations), which increased in severity with the severity of positivity violations. For example, coverage ranged from 0.945 at β p = − 2 to 0.872 at β p = 0 . In contrast, the robust EIF approach consistently resulted in coverage at around 0.95–0.96 at low values of β p and increased with β p , consistent with expectation for the overestimation of the variance under increasing positivity by this approach. For example, at β p = − 0.7 , coverage remained at 0.98 at all values of β ψ 0 . At β p ≥ − 0.1 , the observed coverage was almost always greater than or equal to 0.99 at all values of β ψ 0 . As the figure only contains a single contour corresponding to 95%, neither the empirical EIF (consistently under 0.95) nor the robust EIF approach (consistently over 0.95) contains the lines. The bootstrap-based coverage shown in Figure 4(d) varied the least, with coverage consistently between 0.95 and 0.97 irrespective of the treatment effect ( β ψ 0 ) and positivity severity ( β p ) considered.

Regarding the observed power (Figure 5), the empirical-EIF-based variance approach resulted in the highest power among all three variance estimation approaches when an effect was present. For example, at β ψ 0 = 1 and β p = − 1 , the observed power was 0.71, 0.51, and 0.51 for the empirical-EIF (Q), robust-EIF, and bootstrap approaches, respectively. However, this gain came at a cost of higher Type I error, which became uncontrolled as positivity violations increased (i.e., with an increase in β p ). For example, at β p = − 2 an observed 5.2 % of the simulations incorrectly rejected the null hypothesis. This proportion increased to as high as 12 % at β p = 0 . Alternatively, the robust EIF estimation approach resulted in nominal to low Type I errors (i.e., between 0 and 5.1%). The bootstrap approach resulted in the best performance overall of the estimators considered, with higher power than the robust EIF approach when an effect was present while simultaneously retaining appropriate control of the Type I error at all levels of β p when no effect was present. For example, at β p = 0 , only 4.6 % of the simulations incorrectly rejected the null hypothesis.

To further investigate the performance, we evaluated (i) the sampling distribution of the point estimator and (ii) the distribution of the variance estimator, conditioning on the data-generating process with the highest level of positivity violations ( β p = 0 ) and no treatment effect ( β ψ 0 = 0 ). The results (Figure 6(a)) show that the distribution of the point estimators is approximately normal, despite having somewhat heavier tails, suggesting that lack of normality of the estimator itself is not the primary driver of the loss of coverage observed. Instead, results suggest that under-coverage is likely due primarily to the highly right-skewed distribution of the empirical EIF variance estimators. The distribution of the empirical EIF approach of estimating variance is noticeably skewed (Figure 6(b)), with most of the probability mass well under the Monte Carlo variances. Conversely, the bootstrap variance estimator is less skewed, with a distribution that is more closely centered around the observed Monte Carlo variance. The robust EIF variance estimator is also noticeably less skewed, though it tends to overestimate the variance.

$Figure 6 Under no treatment effect ( β ψ 0 = 0 {\beta }_{{\psi }_{0}}=0 ) at the most severe positivity setting ( β p = 0 {\beta }_{p}=0 ). (a) Sampling distribution of TMLE point estimates and (b) sampling distribution of variance estimates with overlaid Monte Carlo variance (black vertical line).$

Figure 6

Under no treatment effect ( β ψ 0 = 0 ) at the most severe positivity setting ( β p = 0 ). (a) Sampling distribution of TMLE point estimates and (b) sampling distribution of variance estimates with overlaid Monte Carlo variance (black vertical line).

5.3.2 Longitudinal treatment results

Results for the longitudinal setting are similar to the point treatment setting. Similar to the point treatment setting, Figure 7 shows the mean of the variance estimates under each approach, overlaid with the Monte Carlo variance of the intervention-specific mean outcome point estimators. The same trend over the different levels of positivity was seen as in Figure 3, with the variance increasing with increasing magnitude of positivity violations. The empirical EIF approach also performed well here at low levels of β p . At high values of β p , the approach more noticeably underestimates the variance of the intervention-specific mean outcome point estimator. Consistent with the point treatment setting, the mean of the robust EIF variance estimator overestimated the variance. The bootstrap approach resulted in variance estimates that were slightly conservative on average, though were still very similar to the Monte Carlo variance estimates.

$Figure 7 Mean of variance estimates for each estimator under no treatment effect ( β ψ 0 = 0 {\beta }_{{\psi }_{0}}=0 ) at each positivity ( β p {\beta }_{p} ) value under the longitudinal treatment setting, overlaid with the estimator’s Monte Carlo variance.$

Figure 7

Mean of variance estimates for each estimator under no treatment effect ( β ψ 0 = 0 ) at each positivity ( β p ) value under the longitudinal treatment setting, overlaid with the estimator’s Monte Carlo variance.

Figure 8 shows the coverage corresponding to each variance estimation approach for the TMLE point estimator of the intervention-specific mean outcome. Coverage for the empirical EIF approaches was consistently anti-conservative, even at low levels of positivity violations. As in the point treatment setting, this was true despite the apparent low bias of this variance estimator, likely due again to its heavily skewed distribution. For example, at a null effect (i.e., β ψ 0 ), the observed coverage was 0.91 at β p = − 2 and 0.87 at β p = 0 for the empirical EIF Q . For the robust EIF approach, coverage remained conservative. This became as high as 1.00 (i.e., all simulated confidence intervals captured the true parameter value) at higher levels of positivity issues. For the bootstrap approach, close to nominal coverage was seen, even for higher levels of positivity violations. For example, under a null effect, a coverage of 0.97 was observed at β p = − 2 and 0.95 at β p = 0 .

$Figure 8 Simulated 95% confidence interval coverage for each variance estimation approach for the TMLE point estimator under various treatment ( β ψ 0 {\beta }_{{\psi }_{0}} ) and positivity ( β p {\beta }_{p} ) values under the longitudinal treatment setting. Contour line corresponds to 95%. (a) Empirical EIF Q Q , (b) empirical EIF Q ∗ Q\ast , (c) robust EIF, and (d) bootstrap.$

Figure 8

Simulated 95% confidence interval coverage for each variance estimation approach for the TMLE point estimator under various treatment ( β ψ 0 ) and positivity ( β p ) values under the longitudinal treatment setting. Contour line corresponds to 95%. (a) Empirical EIF Q , (b) empirical EIF Q ∗ , (c) robust EIF, and (d) bootstrap.

Results for the Type I error and power were also similar to the point treatment setting. When there was an effect, the empirical EIF approach resulted in the highest power. At β ψ 0 = 1 and β p = − 2 , we observed a power of 0.99. However, the Type I error was also uncontrolled here, becoming as high as 0.14 at β p = 0 . In contrast, the robust EIF approach resulted in overly conservative Type I error rates, particularly in the context of greater positivity violation, and thus, the power for this approach when an effect was present was the lowest. For example, for a treatment effect size of β ψ 0 = 1 , we observed a power ranging from 0.94 at β p = − 2 to 0.47 at β p = 0 . The bootstrap approach resulted in generally well-controlled Type I error rates, with observed values ranging between 0.02 and 0.08. Power was also higher than the robust EIF approach across all values of β ψ 0 and β p . For a treatment effect size of β ψ 0 = 1 , we observed a power ranging from 0.99 at β p = − 2 to 0.93 at β p = 0 for the bootstrap approach. Compared with the robust EIF approach, this is up to a twofold increase in power (Figure 9).

$Figure 9 Simulated power for each variance estimation approach for the TMLE estimator under various treatment ( β ψ 0 {\beta }_{{\psi }_{0}} ) and positivity ( β p {\beta }_{p} ) values under the longitudinal treatment setting. Contour line corresponds to 5%. (a) Empirical EIF Q Q , (b) empirical EIF Q * Q* , (c) robust EIF, and (d) bootstrap.$

Figure 9

Simulated power for each variance estimation approach for the TMLE estimator under various treatment ( β ψ 0 ) and positivity ( β p ) values under the longitudinal treatment setting. Contour line corresponds to 5%. (a) Empirical EIF Q , (b) empirical EIF Q * , (c) robust EIF, and (d) bootstrap.

6 Discussion

The goal of this study was to establish a consistent and robust approach for estimating the variance of asymptotically efficient estimators such as TMLE, estimating equations, and one-step estimators which, in contrast to the common approach based on the empirical variance of the estimated EIF, does not provide anti-conservative inference when confronted with positivity violations. We have presented two such approaches for estimating this variance: (1) a robust approach that directly targets the asymptotic variance of the EIF and (2) a bootstrap approach based on fitting the initial density of the data once, followed by a non-parametric bootstrap of the targeting step. In simulations, the variance of the point estimator increased with increases in positivity violations as expected. A common trend was observed, with the empirical EIF approach of variance estimation providing anti-conservative confidence interval coverage (as previously reported). In contrast, the robust EIF approach described here resulted in increasingly conservative coverage as the degree of positivity violation increased. The bootstrap-based approach provided closer to nominal 95% confidence interval coverage and Type I error control (with attendant power gains relative to the robust estimator) in the face of extreme positivity violations, both in the point treatment and in the longitudinal setting.

While the EIF can raise a red flag for lack of identifiability, for substitution estimators such as TMLE, we suggest that it is overly conservative for constructing valid confidence intervals and tests in finite sample in the face of substantial positivity violations. Previous work [12] suggested the use of the parametric bootstrap as a diagnostic tool for sparsity-bias in the point treatment setting. The approach can become cumbersome, as the analyst would need to refit the whole likelihood for each iteration of the bootstrap, and computationally intensive, particularly in longitudinal treatment settings. Our robust EIF approach is able to avoid estimating the whole likelihood by targeting the required means under the post-intervention distribution defined by the longitudinal g-computation formula directly. Even if we use an actual TMLE of P 0 d , our analytic estimate of the variance is still much less computer-intensive than the parametric bootstrap method, in particular, in view that one would need to run many replicate samples of the data set in order to pick up the observations that correspond with a rare treatment and thus contribute large numbers to the variance expression. Our presented bootstrap approach, while more computationally intensive than the robust EIF approach, is also superior to the earlier proposed approach, in that we do not have to refit the entire likelihood under each iteration. This also significantly reduces the computational resources required to obtain estimates of the target parameter, particularly if computationally intensive non-parametric machine learning algorithms are used to estimate these densities. Therefore, we believe that the proposed analytic method will be (at least, practically) superior to the earlier proposed parametric bootstrap method.

In order to highlight challenges to variance estimation arising from positivity violations, and the extent to which these are addressed by the alternative variance estimators proposed, this article has focused on the setting in which the treatment mechanism is estimated using a correctly specified parametric model, while the outcome regressions are estimated using either a correct or mis-specified parametric model. In practice, outside of randomized trial settings, non-parametric machine learning approaches are typically required to ensure asymptotic linearity of the estimator (see, e.g., Klaassen [25], van der Laan and Robins [26]). In this setting, valid inference also requires adequate convergence rate of the second-order remainder. In particular, the relatively slow convergence of machine learning estimators poses a serious challenge to finite sample inference due to the non-negligible contribution of the second-order remainder. One possible response to this challenge is to incorporate data-adaptive nuisance parameter estimation into our proposed non-parametric bootstrap, an approach shown to be promising in prior work [27], although not evaluated in the context of positivity violations. Future research is needed to evaluate the performance of this approach under the joint challenges of positivity violations and slower machine-learning-based convergence rates.

Further refinements can be applied in an attempt to obtain variance estimates with an even smaller finite sample bias. One such approach is a convex combination of the variance estimators considered earlier. For example, we noted in supplementary analyses that taking

α ˆ n σ ˆ e EIF , n 2 + ( 1 − α ˆ n ) σ ˆ r EIF , n 2

had good performance, where σ ˆ e EIF , n 2 is the variance estimate using the empirical EIF approach, σ ˆ r EIF , n 2 is the variance estimate using the robust EIF approach, and α ˆ n = ∣ σ ˆ r EIF , n 2 − σ ˆ e EIF , n 2 ∣ / ( σ ˆ r EIF , n 2 + σ ˆ e EIF , n 2 ) . We note, however, that such an approach is somewhat ad hoc and may lead to varying results in other simulations or distributions. We therefore chose not to present the results here.

A potential limitation of the robust approach at estimating the variance involves the conditions for asymptotic linearity to be met. Note, however, that we always require that our estimator (of the parameter) be asymptotically linear. Thus, this is actually a limitation of our parameter estimator. Given that we have an asymptotically linear estimator, we want a good estimator of its variance. If we correctly specify both the outcome and treatment models, then the variance is equal to the variance of the EIF and we can proceed with applying our robust method of estimating variance knowing that we are targeting the correct variance. However, the variance of the EIF is not consistent for the variance of the effect estimator if either of the outcome or treatment models are not correctly specified.

Furthermore, it is also required that Q ¯ 0 d , σ t 2 be estimated both consistently and at a fast enough rate. We limited the computational complexity in our simulations by using simpler parametric models to estimate Q ¯ 0 d , σ t 2 , though a more non-parametric approach such as Super Learning could have been applied. This approach can become computationally expensive if there are many time points. In this regard, the bootstrap approach is superior as it does not require the additional estimation of Q ¯ 0 d , σ t 2 .

It would be of interest to further evaluate not only the practical performance of these variance estimation approaches in future studies, but also the application of the approaches to other parameters. Appendix A derives the general approach for working marginal structural models. Further research into the practical performance of this approach is needed for this setting. These variance estimation approaches can also be used for more complex parameters, such as mean outcomes under dynamic regimes, stochastic interventions, or marginal structural working models. Future research could also develop a collaborative TMLE [28] or cross-validated [29] based approach at robustly estimating the EIF variance.

Funding information: This work was supported by the Doris Duke Clinical Scientist Development Award (NIH-NIAID U01AI069911), The National Institute of Allergy and Infectious Diseases of the National Institutes of Health (U01AI069911), The President’s Emergency Plan for AIDS Relief (PEPFAR) (AID-623-A-12-0001), and NIH (R01AI074345).
Conflict of interest: Prof. Maya Petersen and Prof. Mark J. van der Laan are Editors of the Journal of Causal Inference but were not involved in the review process of this article.

Appendix A Marginal structural working model variance

A.1 TMLE of σ K + 1 2 for marginal structural working models

For the general working logistic marginal structural model (MSM) Θ ≡ { m β : β } from Petersen et al. [12], we have that the component of the EIF corresponding with the last time point K + 1 equals

D K + 1 ∗ ( P ) = ∑ d ∈ D h 1 ( d , K + 1 ) I ( A ¯ ( K ) = d ( L ¯ ( K ) ) ) g 0 : K ( A ¯ ( K ) , L ¯ ( K ) ) ( Y − Q ¯ K + 1 ( A ¯ ( K ) , L ¯ ( K ) ) ) = C K + 1 ( P ) ( A ¯ , L ¯ ) ( Y − Q ¯ K + 1 ) ,

where, for some user-defined weight function h ( d , K + 1 ) ,

C K + 1 ( P ) ( A ¯ , L ¯ ) = ∑ d ∈ D h 1 ( d , K + 1 ) I ( A ¯ ( K ) = d ( L ¯ ( K ) ) ) g 0 : K ( A ¯ , L ¯ ) , h 1 ( d , K + 1 ) = h ( d , K + 1 ) ∂ ∂ β m β ( d , K + 1 ) m β ( 1 − m β ) .

Note that we want to obtain a representation of the variance of this component D K + 1 ∗ so that we can use a semi-substitution estimator of this part of the variance of the EIF, hopefully, thereby obtaining a variance estimator that is more accurate under violations of practical positivity, and a variance estimator that can be used as a red flag for lack of identifiability. This variance can thus be written as:

σ K + 1 2 = E [ C 2 ( Y − Q ¯ K + 1 ) 2 ] = E [ C 2 Q ¯ K + 1 ( 1 − Q ¯ K + 1 ) ] = E ∑ d ∈ D h 1 ( d , K + 1 ) I ( A ¯ = d ( L ¯ ) ) 2 Q ¯ K + 1 ( 1 − Q ¯ K + 1 ) g 0 : K 2 ( O ) = E ∑ d 1 , d 2 ∈ D h 1 ( d 1 , K + 1 ) h 1 ( d 2 , K + 1 ) I ( A ¯ = d 1 ( L ¯ ) ) I ( A ¯ = d 2 ( L ¯ ) ) Q ¯ ( 1 − Q ¯ ) g 0 : K 2 ( O ) = ∑ d 1 , d 2 h 1 ( d 1 , K + 1 ) h 1 ( d 2 , K + 1 ) E I ( A ¯ = d 1 ( L ¯ ) ) I ( A ¯ = d 2 ( L ¯ ) ) Q ¯ ( 1 − Q ¯ ) g 0 : K 2 ( O ) .

The latter expectation equals

∫ L ¯ I ( d 1 ( L ¯ ) = d 2 ( L ¯ ) ) ∏ t = 0 K + 1 q ( L t ∣ A ¯ ( t − ) = d 1 ( L ¯ ( t − ) ) , L ¯ ( t − ) ) Q ¯ ( 1 − Q ¯ ) g 0 : t ( d 1 ( L ¯ ) , L ¯ ) = E P 0 d 1 I ( d 1 ( L ¯ d 1 ) = d 2 ( L ¯ d 1 ) ) Q ¯ ( 1 − Q ¯ ) g 0 : K ( d 1 ( L ¯ d 1 ) , L ¯ d 1 ) .

This yields the following expression:

σ K + 1 2 = ∑ d 1 , d 2 ∈ D h 1 ( d 1 , K + 1 ) h 1 ( d 2 , K + 1 ) E I ( d 1 ( L ¯ d 1 ) = d 2 ( L ¯ d 1 ) ) Q ¯ ( 1 − Q ¯ ) g 0 : K ( d 1 ( L ¯ d 1 ) , L ¯ d 1 ) = ∑ d 1 ∈ D h 1 ( d 1 , K + 1 ) E ∑ d 2 ∈ D h 1 ( d 2 , K + 1 ) I ( d 1 ( L ¯ d 1 ) = d 2 ( L ¯ d 1 ) ) Q ¯ ( 1 − Q ¯ ) g 0 : K ( d 1 ( L ¯ d 1 ) , L ¯ d 1 ) = ∑ d 1 ∈ D h 1 ( d 1 , K + 1 ) E Z d 1 ( d 1 , K + 1 ) ,

where

Z ( d 1 , K + 1 ) = ∑ d 2 ∈ D h 1 ( d 2 , K + 1 ) I ( d 1 ( L ¯ ( K ) ) = d 2 ( L ¯ ( K ) ) ) Q ¯ ( 1 − Q ¯ ) g 0 : K ( d ( L ¯ ( K ) ) , L ¯ ( K ) ) ,

so that the counterfactual of Z ( d 1 , K + 1 ) under intervention d 1 is given by:

Z d 1 ( d 1 , K + 1 ) = ∑ d 2 ∈ D h 1 ( d 2 , K + 1 ) I ( d 1 ( L ¯ d 1 ( K ) ) = d 2 ( L ¯ d 1 ( K ) ) ) Q ¯ ( 1 − Q ¯ ) g 0 : K ( d 1 ( L d 1 ( K ) ) , L d 1 ( K ) ) .

A.1.1 Static regimens

In the special case that the class of dynamic regimens D consists only of static regimens a ¯ ( K ) so that there is only one and exactly one treatment such that A ¯ ( K ) = d ( L ¯ ( K ) ) , then we have

Z ( K + 1 ) = h 1 ( A ¯ , K + 1 ) Q ¯ ( 1 − Q ¯ ) g 0 : K ( A ¯ , L ¯ ) ,

so that

Z d ( K + 1 ) = h 1 ( d , K + 1 ) Q ¯ ( 1 − Q ¯ ) g 0 : K ( d ( L ¯ d ) , L ¯ d ) .

In that case, we have

σ K + 1 2 = ∑ d ∈ D h 1 ( d , K + 1 ) 2 E Z 1 d ( K + 1 ) ,

where Z 1 ( K + 1 ) = Q ¯ ( 1 − Q ¯ ) / g 0 : K ( A ¯ , L ¯ ) and Z 1 d ( K + 1 ) = Q ¯ ( 1 − Q ¯ ) / g 0 : K ( d ( L ¯ d ) , L ¯ d ) .

It is important to note that in expressing our variance this way, we integrate out the indicator of treatment over A ¯ , i.e., I ( A ¯ = d ( L ¯ ) ) . By getting rid of this indicator, we no longer rely as heavily on observations from subjects following treatment in estimating the variance of D K + 1 ∗ . This particularly helps us when there is a lack of positivity, since subjects with low probabilities of desired treatment simply are not observed.

We have now shown that

σ K + 1 2 = ∑ d ∈ D h 1 ( d , K + 1 ) E Z d ( d , K + 1 ) ,

where

Z ( d 1 , K + 1 ) = ∑ d 2 h 1 ( d 2 , K + 1 ) I ( d 1 ( L ¯ ) = d 2 ( L ¯ ) ) Q ¯ ( 1 − Q ¯ ) g 0 : K ( d 1 ( L ¯ ) , L ¯ ) .

We can now define Z ( K + 1 ) ( A ¯ , L ¯ ) = ∑ d ∈ D h 1 ( d , K + 1 ) I ( A ¯ = d ( L ¯ ) ) Q ¯ ( 1 − Q ¯ ) g 0 : K ( A ¯ , L ¯ ) (as a function of A ¯ , L ¯ ) as a new outcome for our longitudinal data structure such that Z d ( d , K + 1 ) = Z ( K + 1 ) ( d ( L ¯ d ) , L ¯ d ) . Our variance σ K + 1 2 is then represented as ∑ d ∈ D h 1 ( d , K + 1 ) E Z d ( d , K + 1 ) . Thus, if we redefine the longitudinal data as ( A ¯ , L ¯ ) with the final outcome of interest as Z ( K + 1 ) = Z ( K + 1 ) ( A ¯ , L ¯ ) , and use the working MSM parameter E Z d ( K + 1 ) = β 0 , with β 0 = arg min β ∑ d ∈ D h 1 ( d , K + 1 ) ( E Z d ( K + 1 ) − β ) 2 , then we have that

β 0 = ∑ d ∈ D h 1 ( d , K + 1 ) E Z d ( K + 1 ) / ∑ d ∈ D h 1 ( d , K + 1 ) .

This demonstrates that we can obtain σ K + 1 2 by simply multiplying β 0 by ∑ d ∈ D h 1 ( d , K + 1 ) , i.e.,

σ K + 1 2 = β 0 ∑ d h 1 ( d , K + 1 ) .

We can therefore also estimate this variance component σ K + 1 2 by computing the TMLE of the intercept β 0 in the working MSM for our newly defined outcome Z ( K + 1 ) using weights h 1 ( d , K + 1 ) and then multiplying it against ∑ d ∈ D h 1 ( d , K + 1 ) .

A.2 TMLE of σ t 2 for marginal structural working models

We now present how to obtain a TMLE of the variance of the t th component of the EIF, σ t 2 . For the general working MSM from Petersen et al. [12], we have that the component corresponding with the t th time point equals

D t ∗ ( P ) = ∑ d ∈ D h 1 ( d , t ) I ( A ¯ ( t − ) = d ( L ¯ ( t − ) ) ) g 0 : t − ( A ¯ ( t − ) , L ¯ ( t − ) ) ( Q ¯ t + d ( A ¯ ( t ) , L ¯ ( t ) ) − Q ¯ t d ) ( A ¯ ( t − ) , L ¯ ( t − ) ) = ∑ d ∈ D C t ( P , d ) ( Q ¯ t + d − Q ¯ t d ) .

Similarly, we want to obtain a representation of the variance of this component so that we can use a semi-substitution estimator of this part of the variance of the EIF, hopefully, thereby obtaining a variance estimator that is more accurate under violations of practical positivity, and a variance estimator that can be used as a red flag for lack of identifiability. This variance σ t 2 can thus be written as:

σ t 2 = ∑ d 1 , d 2 h 1 ( d 1 , t ) h 1 ( d 2 , t ) E I ( A ¯ ( t − ) = d 1 ) I ( A ¯ ( t − ) = d 2 ) Σ t ( d 1 , d 2 ) g 0 : t − 2 ( A ¯ ( t − ) , L ¯ ( t − ) ) ,

where

Σ t ( d 1 , d 2 ) ( A ¯ ( t − ) , L ¯ ( t − ) ) = E [ ( Q ¯ t + d 1 − Q ¯ t d 1 ) ( Q ¯ t + d 2 − Q ¯ t d 2 ) ∣ A ¯ ( t − ) , L ¯ ( t − ) ]

is the conditional covariance of Q ¯ t + d 1 and Q ¯ t + d 2 , given ( A ¯ ( t − ) , L ¯ ( t − ) ) . Note that this can be obtained by regressing this cross-product on ( A ¯ ( t − ) , L ¯ ( t − ) ) . The latter sum can be further worked out giving us

σ t 2 = ∑ d 1 ∈ D h 1 ( d 1 , t ) E Z d 1 ( d 1 , t ) ,

where

Z ( d 1 , t ) = ∑ d 2 ∈ D h 1 ( d 2 , t ) I ( d 1 ( L ¯ ( t − ) ) = d 2 ( L ¯ ( t − ) ) ) Σ t ( d 1 , d 2 ) g 0 : t − ( d 1 , t − ( L ¯ ( t − ) ) , L ¯ ( t − ) ) ,

so that the counterfactual of Z t under intervention d 1 is given by:

Z d 1 ( d 1 , t ) = ∑ d 2 ∈ D h 1 ( d 2 , t ) I ( d 1 ( L ¯ d 1 ( t − ) ) = d 2 ( L ¯ d 1 ( t − ) ) ) Σ t ( d 1 , d 2 ) g 0 : t − ( d 1 , t − ( L ¯ d 1 ( t − ) ) , L ¯ d 1 ( t − ) ) .

With this expression, we can now use a TMLE to estimate E Z d 1 ( d 1 , t ) for each d 1 ∈ D by using the longitudinal data structure with final outcome Z ( d 1 , t ) , for each d 1 separately. To create the observed outcome Z ( d 1 , t ) , we need a fit of the treatment mechanism g A ( m ) : m = 0 , 1 , … , t − , evaluated at A ¯ ( t − ) = d t − ( L ¯ ( t − ) ) , and for each rule compatible with d 1 (for that subject), we need to have an estimate of Σ t ( d 1 , d 2 ) . Thus, given a priori estimates of the full treatment mechanism and all ( Σ t ( d 1 , d 2 ) : d 1 , d 2 ∈ D ) , we can construct this observed outcome Z ( d 1 , t ) and run the TMLE.

A.3 Estimation of the variance of the EIF

The aforementioned approach defines for each time point t and each rule d an observed longitudinal outcome Z ( d , t ) , where Z ( d , t ) is a function of ( A ¯ ( t ) , L ¯ ( t ) ) . The TMLE of E Z d ( d , t ) can then be computed based on the longitudinal data structure ( L ( 0 ) , A ( 0 ) , … , L ( t ) , A ( t ) , Z ( d , t ) ) for each d and each t ∈ { 0 , 1 , … , K + 1 } . As a result, we have that σ t 2 = ∑ d ∈ D h 1 ( d , t ) E Z d ( d , t ) and

σ 2 = ∑ t = 0 K + 1 σ t 2 = ∑ d ∈ D ∑ t = 0 K + 1 h 1 ( d , t ) E Z d ( d , t ) = ∑ d ∈ D E ∑ t = 0 K + 1 h 1 ( d , t ) Z d ( d , t ) .

Let us now define the counterfactual outcome

Z ¯ d ( d ) ≡ ∑ t = 0 K + 1 h 1 ( d , t ) Z d ( d , t ) ,

and the corresponding observed outcome

Z ¯ ( d ) ≡ ∑ t = 0 K + 1 h 1 ( d , t ) Z ( d , t ) .

We could apply the TMLE to estimate E Z ¯ d ( d ) based on the longitudinal data structure ( L ( 0 ) , A ( 0 ) , … , L ( K ) , A ( K ) , Z ¯ ( d , K + 1 ) ) , for each d ∈ D , and use that

σ 2 = ∑ d ∈ D E Z ¯ d ( d ) .

In applying TMLE here, we should be using that

E [ Z ¯ d ∣ A ¯ ( m ) , L ¯ ( m ) ] = ∑ t ≤ m h 1 ( d , t ) Z ( d , t ) + E ∑ t > m h 1 ( d , t ) Z ( d , t ) ∣ A ¯ ( m ) , L ¯ ( m ) .

To start with, let

Q ¯ d Z ( K + 1 ) = E [ Z ¯ ( d ) ∣ A ¯ ( K ) , L ¯ ( K ) ] = ∑ t ≤ K h 1 ( d , t ) Z ( d , t ) + E [ h 1 ( d , K + 1 ) + Z ( d , K + 1 ) ∣ A ¯ ( K ) , L ¯ ( K ) ] .

Denote the last conditional expectation with Q ¯ d Z ( K + 1 ) , d so that

Q ¯ d Z ( K + 1 ) = ∑ t ≤ K h 1 ( d , t ) Z ( d , t ) + Q ¯ d Z ( K + 1 ) , d .

Then,

Q ¯ d Z ( K ) = E [ Q ¯ d Z ( K + 1 ) ∣ A ¯ ( K − 1 ) , L ¯ ( K − 1 ) ] = ∑ t ≤ K − 1 h 1 ( d , t ) Z ( d , t ) + E [ h 1 ( d , K ) Z ( d , K ) + Q ¯ d Z ( K + 1 ) , d ∣ A ¯ ( K − 1 ) , L ¯ ( K − 1 ) ] .

Again, denote the latter conditional expectation by Q ¯ d Z ( K ) , d so that

Q ¯ d Z ( K ) = ∑ t ≤ K − 1 h 1 ( d , t ) Z ( d , t ) + Q ¯ d Z ( K ) , d .

Then,

Q ¯ d Z ( K − 1 ) = E [ Q ¯ d Z ( K ) ∣ A ¯ ( K − 2 ) , L ¯ ( K − 2 ) ] = ∑ t ≤ K − 2 h 1 ( d , t ) Z ( d , t ) + E [ h 1 ( d , K − 1 ) Z ( d , K − 1 ) + Q ¯ d Z ( K ) , d ∣ A ¯ ( K − 2 ) , L ¯ ( K − 2 ) ] .

Again, denote the latter conditional expectation with Q ¯ d Z ( K − 1 ) , d so that

Q ¯ d Z ( K − 1 ) = ∑ t ≤ K − 2 h 1 ( d , t ) Z ( d , t ) + Q ¯ d Z ( K − 1 ) , d .

This is then iterated:

Q ¯ d Z ( m ) = ∑ t ≤ m − 1 h 1 ( d , m ) Z ( d , m ) + Q ¯ d Z ( m ) , d ,

where Q ¯ d Z ( m ) , d = E [ h 1 ( d , m ) Z ( d , m ) + Q ¯ d Z ( m + 1 ) , d ∣ A ¯ ( m − 1 ) , L ¯ ( m − 1 ) ] .

Before we go to the next conditional expectation, we need to target with a parametric submodel, such as

Logit Q ¯ d m ( ε ) = Logit Q ¯ d m + ε I ( A ¯ ( m − 1 ) = d ( L ¯ ( m − 1 ) ) ) g 0 : m − 1 .

In this way, we will only have to run one TMLE for each rule d , which still utilizes that the outcome is a sum of outcomes that are known for histories including that outcome.

B TMLE bootstrap consistency

Theorem 1

Let P n 0 be the initial estimator and P n 0 ( ε n ) be the parametric TMLE update so that P n D ∗ ( P n 0 ( ε n ) ) = 0 . Suppose the following:

D ∗ ( P n 0 ) falls in a P 0 Donsker class with probability tending to 1,
R 0 ( P n 0 ( ε n ) , P 0 ) = o P ( n − 1 / 2 ) ,
P 0 ( D ∗ ( P n 0 ( ε n ) ) − D ∗ ( P 0 ) ) 2 → p 0 .

Then,

n 1 / 2 ( Ψ ( P n 0 ( ε n ) ) − Ψ ( P 0 ) ) = n 1 / 2 P n D ∗ ( P 0 ) + o P ( 1 ) → N ( 0 , σ 0 2 = P 0 D ∗ 2 ( P 0 ) ) ,

where the empirical measure of the bootstrap sample O i # ∼ i i d P n is denoted with P n # and ε n # is the maximum likelihood estimate of ε for P n 0 ( ε ) based on P n # .

Moreover, let now P n 0 ( ε n # ) be TMLE update based on bootstrap sample P n # so that P n # D ∗ ( P n 0 ( ε n # ) ) = 0 .

We have n 1 / 2 Ψ ( P n 0 ( ε n # ) ) − Ψ ( P n 0 ( ε n ) ) = n 1 / 2 ( P n # − P n ) D ∗ ( P 0 ) + o P ( 1 ) , which (conditional on P n ) converges to N ( 0 , σ 0 2 ) . This proves the consistency of the non-parametric bootstrap for the TMLE that treats the initial estimator P n 0 as fixed.

Note that we do not require more than B0, B1, and B2, which are the assumptions under which TMLE is asymptotically linear and efficient.

Proof

Let P n 0 be the initial estimator. Let P n ∗ = P n 0 ( ε n ) be the TMLE and Ψ ( P n 0 ( ε n ) ) be the plug in for the TMLE. Let us first analyze the TMLE.

By the usual identity for TMLE, we have

Ψ ( P n 0 ( ε n ) ) − Ψ ( P 0 ) = ( P n − P 0 ) D ∗ ( P n 0 ( ε n ) ) + R 0 ( P n 0 ( ε n ) , P 0 ) .

Since for TMLE we assume the initial estimator converges fast enough, we have R 0 ( P n 0 ( ε n ) , P 0 ) = o P ( n − 1 / 2 ) (B1). Moreover,

( P n − P 0 ) D ∗ ( P n 0 ( ε n ) ) = ( P n − P 0 ) ( D ∗ ( P n 0 ( ε n ) ) − D ∗ ( P 0 ) ) + ( P n − P 0 ) D ∗ ( P 0 ) .

Our assumption of Donsker classes (B0) and (B1) results in the first term ( P n − P 0 ) D ∗ ( P n 0 ( ε n ) ) being an empirical process term o P ( n − 1 / 2 ) . Thus, we now have

Ψ ( P n 0 ( ε n ) ) − Ψ ( P 0 ) = ( P n − P 0 ) D ∗ ( P 0 ) + o P ( n − 1 / 2 )

proving asymptotic efficiency. The remainder of the proof essentially just repeats this proof for the TMLE that treats P n 0 as fixed but now we base this TMLE on a sample from P n , whose empirical distribution we denote with P n # and the target is Ψ ( P n 0 ( ε n ) ) (i.e., the truth under sampling from P n ). This results in the TMLE on the bootstrap sample being Ψ ( P n 0 ( ε n # ) ) and

Ψ ( P n 0 ( ε n # ) ) − Ψ ( P 0 ) = ( P n # − P 0 ) D ∗ ( P n 0 ( ε n # ) ) + R P 0 ( P n 0 ( ε n # ) , P 0 ) Ψ ( P n 0 ( ε n ) ) − Ψ ( P 0 ) = ( P n − P 0 ) D ∗ ( P n 0 ( ε n ) ) + R P 0 ( P n 0 ( ε n ) , P 0 ) .

Subtracting the two equations gives

Ψ ( P n 0 ( ε n # ) ) − Ψ ( P n 0 ( ε n ) ) = ( P n # − P 0 ) D ∗ ( P n 0 ( ε n # ) ) − ( P n − P 0 ) D ∗ ( P n 0 ( ε n ) ) + o P ( n − 1 / 2 )

since both remainder terms are o P ( n − 1 / 2 ) . Note that the left-hand side is our bootstrapped centered estimator. However, since ( P n # − P 0 ) D ∗ = ( P n # − P n ) D ∗ + ( P n − P 0 ) D ∗ , the difference of the two empirical process terms equals

( P n # − P n ) D ∗ ( P n 0 ( ε n # ) ) + ( P n − P 0 ) D ∗ ( P n 0 ( ε n # ) ) − D ∗ ( P n 0 ( ε n ) ) .

The second term is an empirical process term o P ( n − 1 / 2 ) due to ε n # − ε n converging to zero and Donsker class condition (B0, B1). Thus, we have

Ψ ( P n 0 ( ε n # ) ) − Ψ ( P n 0 ( ε n ) ) = ( P n # − P n ) D ∗ ( P n 0 ( ε n # ) ) + o P ( n − 1 / 2 ) .

Finally, the latter empirical process term is

( P n # − P n ) D ∗ ( P 0 ) + ( P n # − P n ) ( D ∗ ( P n 0 ( ε n # ) ) − D ∗ ( P 0 ) ) ,

and the latter term is again o P ( n − 1 / 2 ) by the consistency of P n 0 ( ε n # ) to P 0 and Donsker holding automatically since P n 0 is fixed to only a parametric model. We have therefore shown

Ψ ( P n 0 ( ε n # ) ) − Ψ ( P n 0 ( ε n ) ) = ( P n # − P n ) D ∗ ( P 0 ) + o P ( n − 1 / 2 )

since P n 0 is fixed, so that the class of functions covering D ∗ ( P n 0 ( ε n # ) ) , conditional on P n , is just a parametric class. Conditional on P n and multiplying by n 1 / 2 , we have that this converges to the same N ( 0 , σ 2 ) proving consistency for the bootstrap.□

References

[1] Horvitz D, Thompson D. A generalization of sampling without replacement from a finite universe. J Amer Stat Assoc. 1952;47(260):663–85. 10.1080/01621459.1952.10483446Search in Google Scholar

[2] Robins JM. Marginal structural models. 1997 Proceedings of the American Statistical Association, Section on Bayesian Statistical Science. 1998. p. 1–10. http://link.springer.com/chapter/10.1007/978-1-4419-9782-1_9. Search in Google Scholar

[3] Robins JM, Rotnitzky A. Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell NP, Dietz K, Farewell VT, editors. AIDS epidemiology. Boston: Birkhäuser; 1992. p. 297–331. 10.1007/978-1-4757-1229-2_14Search in Google Scholar

[4] Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Amer Stat Assoc. 1994;89(427):846–67. 10.1080/01621459.1994.10476818Search in Google Scholar

[5] Robins JM, Rotnitzky A, van der Laan MJ. Discussion of “On profile likelihood” by Murphy and van der Vaart. J Amer Stat Assoc. 2000;95(450):477–82. 10.1080/01621459.2000.10474224Search in Google Scholar

[6] Robins JM. Robust estimation in sequentially ignorable missing data and causal inference models. In: Proceedings of the American Statistical Association Section on Bayesian Statistical Science; 2000. p. 6–10. Search in Google Scholar

[7] Robins JM, Rotnitzky A. Comment on the Bickel and Kwon article, “Inference for semiparametric models: some questions and an answer”. Statistica Sinica. 2001;11(4):920–36. Search in Google Scholar

[8] Rotnitzky A, Robins J. Inverse probability weighted estimation in survival analysis. In Wiley StatsRef: Statistics Reference Online (N. Balakrishnan, T. Colton, B. Everitt, W. Piegorsch, F. Ruggeri and J.L. Teugels editors); 2014.10.1002/9781118445112.stat06031Search in Google Scholar

[9] van der Laan MJ, Gruber S. Targeted minimum loss based estimation of an intervention specific mean outcome. U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 290.Search in Google Scholar

[10] Hampel FR. The influence curve and its role in robust estimation. J Amer Stat Assoc. 1974;69(346):383–93. http://www.tandfonline.com/doi/abs/10.1080/01621459.1974.10482962. 10.1080/01621459.1974.10482962Search in Google Scholar

[11] Bradley E. Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Stanford, CA: Stanford University; 1981. p. 3. https://statistics.stanford.edu/sites/default/files/EFSNSF156.pdf. Search in Google Scholar

[12] Petersen ML, Porter KE, Gruber S, Wang Y, van der Laan MJ. Diagnosing and responding to violations in the positivity assumption. Stat Meth Med Res. 2012 Feb;21(1):31–54. 10.1177/0962280210386207Search in Google Scholar PubMed PubMed Central

[13] Petersen M, Schwab J, Gruber S, Blaser N, Schomaker M, van der Laan M. Targeted maximum likelihood estimation for dynamic and static longitudinal marginal structural working models HHS public access. J Causal Inference. 2014;2(2):147–85. 10.1515/jci-2013-0007Search in Google Scholar PubMed PubMed Central

[14] Schwab J, Lendle S, Petersen M, van der Laan MJ. LTMLE: longitudinal targeted maximum likelihood estimation. R package version 0.9.3-1. https://CRAN.R-project.org/package=ltmle; 2014. 10.32614/CRAN.package.ltmleSearch in Google Scholar

[15] Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005 Dec;61(4):962–73. http://www.ncbi.nlm.nih.gov/pubmed/16401269. 10.1111/j.1541-0420.2005.00377.xSearch in Google Scholar PubMed

[16] Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period - application to control of the healthy worker survivor effect. Math Modell. 1986;7:1393–512. 10.1016/0270-0255(86)90088-6Search in Google Scholar

[17] Robins JM. Semi-parametric efficiency in multivariate regression models with missing data. J Amer Stat Assoc. 1995;(90):122–29. 10.1080/01621459.1995.10476494Search in Google Scholar

[18] Robins JM. Marginal structural models versus structural nested models as tools for causal inference. In: Halloran ME, Berry D, editors. Statistical Models in Epidemiology, the Environment, and Clinical Trials. The IMA Volumes in Mathematics and its Applications, vol 116. Springer, New York, NY. 1999. https://doi.org/10.1007/978-1-4612-1284-3_2.10.1007/978-1-4612-1284-3_2Search in Google Scholar

[19] van der Laan MJ, Rubin D. Targeted maximum likelihood learning. UC Berkeley Division of Biostatistics Working Paper Series. 2006;(213):1–87. 10.2202/1557-4679.1043Search in Google Scholar

[20] Kang JD, Schafer JL. Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci. 2007 Jan;22(4):523–39. 10.1214/07-STS227Search in Google Scholar PubMed PubMed Central

[21] Robins J, Sued M, Lei-gomez Q, Rotnitzky A. Comment: Performance of double-Robust estimators when “Inverse Probability” weights are highly variable. Stat Sci. 2007;22(4):544–59. 10.1214/07-STS227DSearch in Google Scholar

[22] Rotnitzky A, Lei Q, Sued M, Robins JM. Improved double-robust estimation in missing data and causal inference models. Biometrika. 2012;99(2):1–18. 10.1093/biomet/ass013Search in Google Scholar PubMed PubMed Central

[23] Gruber S, van der Laan MJ. Targeted maximum likelihood estimation: a gentle introduction. UC Berkeley Division of Biostatistics Working Paper Series; 2009. p. 252. Search in Google Scholar

[24] R CoreTeam. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2014. https://www.R-project.org/. Search in Google Scholar

[25] Klaassen CAJ. Consistent estimation of the influence function of locally asymptotically linear estimators. Ann Stat. 1987;15(4):1548–62. 10.1214/aos/1176350609Search in Google Scholar

[26] van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. New York: Springer; 2003. 10.1007/978-0-387-21700-0Search in Google Scholar

[27] Cai W, van der Laan MJ. Nonparametric bootstrap inference for the targeted highly adaptive LASSO estimator. arXiv:1905.10299; 2019. 10.1515/ijb-2017-0070Search in Google Scholar PubMed

[28] van der Laan MJ, Gruber S. Collaborative Double Robust Targeted Maximum Likelihood estimation. Int J Biostat. 2010;6(1):Article 17. Search in Google Scholar

[29] Zheng W, van der Laan M. Asymptotic theory for cross-validated targeted maximum likelihood estimation. UC Berkeley Division of Biostatistics Working Paper Series; 2010. p. 273. http://biostats.bepress.com/ucbbiostat/paper273/. 10.2202/1557-4679.1181Search in Google Scholar PubMed PubMed Central

Received: 2021-12-07

Revised: 2023-02-14

Accepted: 2023-02-14

Published Online: 2023-08-11

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/jci-2021-0067

Keywords for this article

estimator variance; influence function; targeted minimum loss-based estimation; asymptotic efficiency; non-parametric bootstrap; positivity assumption; augmented inverse probability-weighted estimation

Creative Commons

BY 4.0