Estimating the Population Average Treatment Effect in Observational Studies with Choice-Based Sampling

Zhiwei Zhang; Zonghui Hu; Chunling Liu

doi:10.1515/ijb-2018-0093

Artikel

Estimating the Population Average Treatment Effect in Observational Studies with Choice-Based Sampling

Zhiwei Zhang , Zonghui Hu und Chunling Liu

Veröffentlicht/Copyright: 16. April 2019

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift The International Journal of Biostatistics Band 15 Heft 1

Abstract

We consider causal inference in observational studies with choice-based sampling, in which subject enrollment is stratified on treatment choice. Choice-based sampling has been considered mainly in the econometrics literature, but it can be useful for biomedical studies as well, especially when one of the treatments being compared is uncommon. We propose new methods for estimating the population average treatment effect under choice-based sampling, including doubly robust methods motivated by semiparametric theory. A doubly robust, locally efficient estimator may be obtained by replacing nuisance functions in the efficient influence function with estimates based on parametric models. The use of machine learning methods to estimate nuisance functions leads to estimators that are consistent and asymptotically efficient under broader conditions. The methods are compared in simulation experiments and illustrated in the context of a large observational study in obstetrics. We also make suggestions on how to choose the target proportion of treated subjects and the sample size in designing a choice-based observational study.

Keywords: causal inference; double robustness; efficient influence function; machine learning; semiparametric theory; super learner

Acknowledgements

We thank the Editor (Dr. Melanie Prague) and three anonymous reviewers for constructive comments on earlier versions of the paper. Chunling Liu’s research was supported by Hong Kong Polytechnic University (grant # G-YBCU) and General Research Funding of Hong Kong (grant # 15327216).

Appendix

A Semiparametric Theory

In this appendix, we derive the efficient influence functions for estimating μ1, μ0 and δ=μ1−μ0 from the observed data {Oi,i=1,…,n}, which we conceptualize as independent copies of O = (W, A, Y). In the sample, the joint density of O at o = (w, a, y) can be factorized as follows:

{ϖf1(w)g1(y|w)}a{(1−ϖ)f0(w)g0(y|w)}1−a,

where ϖ is unknown and the conditional densities are unknown and unspecified. We will note later that the true value of ϖ, if known, does not change the efficient influence functions.

Consider a regular parametric submodel indexed by θ with θ0 being the true value. The corresponding score for θ at θ0 is given by

S(O;θ0)=Sϖ(A;θ0)+Sf(W,A;θ0)+Sg(O;θ0),

with

Sϖ(A;θ)={A−ϖ(θ)}ϖ˙(θ)ϖ(θ){1−ϖ(θ)},Sf(W,A;θ)=A∂logf1(W;θ)∂θ+(1−A)∂logf0(W;θ)∂θ,Sg(O;θ)=A∂logg1(Y|W;θ)∂θ+(1−A)∂logg0(Y|W;θ)∂θ,

where ϖ˙(θ)=∂ϖ(θ)/∂θ. It follows that the tangent space T for the nonparametric model consists of random variables of the form S(O)=Sϖ(A)+Sf(W,A)+Sg(O), where E{Sϖ(A)}=0, E{Sf(W,A)|A}=0, E{Sg(O)|A,W}=0 and var{S(O)}<∞. In other words, T is the entire Hilbert space of functions of O with zero mean and finite variance.

As a functional of the nonparametric model, the parameter μ1 can be represented as

μ1=∬yg1(y|w){πf1(w)+(1−π)f0(w)}dη(y)dν(w).

We will show that the canonical gradient of μ1 (i.e., the efficient influence function for estimating μ1 in the nonparametric model) is given by ϕeff(1)(O)=ψdr(1)(O)−μ1−(A−ϖ)τ1, where

ψdr(1)(O)=πAYϖp(W)−(1−π)Am(1,W)ϖr(W)+(1−π)(1−A)m(1,W)1−ϖ=:T1−T2+T3τ1=πϖ∫m(1,w)f1(w)dν(w)−1−π1−ϖ∫m(1,w)f0(w)dν(w).

It is easy to verify that E{ψdr(1)(O)}=μ1 and that (A−ϖ)τ1=E{ψdr(1)(O)−μ1|A}. Next, we show that

∂μ1(θ)∂θθ=θ0=Eϕeff(1)(O)S(O;θ0)=Eϕeff(1)(O)Sϖ(A;θ0)+Sf(W,A;θ0)+Sg(O;θ0)

for every regular parametric submodel. By construction, ϕeff(1)(O) is orthogonal to Sϖ(A;θ0). It is also easy to see that μ1+(A−ϖ)τ1 is orthogonal to both Sf(W,A;θ0) and Sg(O;θ0). So it remains to show that

E[ψdr(1)(O){Sf(W,A;θ0)+Sg(O;θ0)}]=E[(T1−T2+T3){Sf(W,A;θ0)+Sg(O;θ0)}]

is equal to

∂μ1(θ0)∂θ=π∬ySf(w,1;θ0)g1(y|w;θ0)f1(w;θ0)dη(y)dν(w)+(1−π)∬ySf(w,0;θ0)g1(y|w;θ0)f0(w;θ0)dη(y)dν(w)+∬ySg(w,1,y;θ0)g1(y|w;θ0){πf1(w;θ0)+(1−π)f0(w;θ0)}dη(y)dν(w)=:d1+d2+d3.

This can be demonstrated by verifying the following:

E{T1Sf(W,A;θ0)}=E[AYSf(W,A;θ0)ϖ{π+(1−π)f0(w)f1(w)}]=E{πAYSf(W,1;θ0)ϖ}+E{(1−π)AYSf(W,1;θ0)f0(W)ϖf1(W)}=E{πYSf(W,1;θ0)|A=1}+E{(1−π)YSf(W,1;θ0)f0(W)f1(W)|A=1}=d1+(1−π)∬ySf(w,1;θ0)g1(y|w)f0(w)dη(y)dν(w),E{T2Sf(W,A;θ0)}=E{(1−π)Am(1,W)f0(W)Sf(W,A;θ0)ϖf1(W)}=E{(1−π)m(1,W)f0(W)Sf(W,1;θ0)f1(W)|A=1}=(1−π)∬ySf(w,1;θ0)g1(y|w)f0(w)dη(y)dν(w)=E{T1Sf(W,A;θ0)}−d1,E{T3Sf(W,A;θ0)}=E{(1−π)(1−A)m(1,W)Sf(W,A;θ0)1−ϖ}=E{(1−π)m(1,W)Sf(W,0;θ0)|A=0}=d2,E{T1Sg(O;θ0)}=E[AY{πf1(W)+(1−π)f0(W)}Sg(O;θ0)ϖf1(W)]=E[Y{πf1(W)+(1−π)f0(W)}Sg(O;θ0)f1(W)|A=1]=d3,E{T2Sg(O;θ0)}=E{T3Sg(O;θ0)}=0,

where the last line follows from the fact that T2 and T3 are functions of (W,A) and E{Sg(O;θ0)|W,A}=0.

Using a similar argument, it can be shown that the efficient influence function for

μ0=∬yg0(y|w){πf1(w)+(1−π)f0(w)}dη(y)dν(w)

is given by ϕeff(0)(O)=ψdr(0)(O)−μ0−(A−ϖ)τ0, where

ψdr(0)(O)=(1−π)(1−A)Y(1−ϖ){1−p(W)}−π(1−A)m(0,W)r(W)1−ϖ+πAm(0,W)ϖ,τ0=πϖ∫m(0,w)f1(w)dν(w)−1−π1−ϖ∫m(0,w)f0(w)dν(w).

Consequently, the efficient influence function for δ=μ1−μ0 is

ϕeff(O)=ϕeff(1)(O)−ϕeff(0)(O)=ψdr(O)−δ−(A−ϖ)τ,

where ψdr(O)=ψdr(1)(O)−ψdr(0)(O) and τ=τ1−τ0.

The preceding discussion treats ϖ as unknown. We now consider the case that ϖ is known and argue that ϕeff(1)(O), ϕeff(0)(O) and ϕeff(O), as defined above, remain the efficient influence functions for μ1, μ0 and δ, respectively. With ϖ known, the score for ϖ, Sϖ(A), must be excluded from S(O), and the tangent space becomes smaller:

T={S(O):E{S(O)|A}=0,var{S(O)}<∞}.

Clearly, ϕeff(1)(O), ϕeff(0)(O) and ϕeff(O) remain elements of T. For every regular parametric submodel, it can be argued as before that

∂μ1(θ)∂θθ=θ0=Eϕeff(1)(O)S(O;θ0)=Eϕeff(1)(O)Sf(W,A;θ0)+Sg(O;θ0),

and similarly for the other two parameters.

To gain some insights on var{ϕeff(O)}, we write

ϕeff(O)=πAh1(W,Y)ϖ+(1−π)(1−A)h0(W,Y)1−ϖ,

where

h1(W,Y)=Yp(W)−m(1,W){1−p(W)}p(W)−m(0,W)−δ(1)=m(1,W)−m(0,W)+Y−m(1,W)p(W)−δ(1),h0(W,Y)=−Y1−p(W)+m(0,W)p(W)1−p(W)+m(1,W)−δ(0)=m(1,W)−m(0,W)−Y−m(0,W)1−p(W)−δ(0),δ(a)=∫{m(1,w)−m(0,w)}fa(w)dν(w),a∈{0,1}.

It follows that

var{ϕeff(O)}=π2var{Ah1(W,Y)}ϖ2+(1−π)2var{(1−A)h0(W,Y)}(1−ϖ)2+2π(1−π)cov{Ah1(W,Y),(1−A)h0(W,Y)}ϖ(1−ϖ).

Noting that E{ha(W,Y)|A=a}=0, a ∈ {0, 1}, it is easy to see that

var{Ah1(W,Y)}=ϖvar{h1(W,Y)|A=1},var{(1−A)h0(W,Y)}=(1−ϖ)var{h0(W,Y)|A=0},cov{Ah1(W,Y),(1−A)h0(W,Y)}=0.

Consequently,

var{ϕeff(O)}=π2var{h1(W,Y)|A=1}ϖ+(1−π)2var{h0(W,Y)|A=0}1−ϖ.

Through conditioning on W (in addition to A), it can be shown that

var{h1(W,Y)|A=1}=var{m(1,W)−m(0,W)|A=1}+E{var(Y|A=1,W)p(W)2|A=1},var{h0(W,Y)|A=0}=var{m(1,W)−m(0,W)|A=0}+E{var(Y|A=0,W)(1−p(W))2|A=0}.

B Asymptotic Theory for Sections 2.2–2.5

Standard regularity conditions in the M-estimation theory (e.g., [38], Chapter 5) are assumed. These include identifiability and smoothness (in parameters) of parametric models, existence of integrable envelopes that permit use of the dominated convergence theorem, and certain Donsker properties that help deal with random functions. Techniques for verifying the Donsker property can be found in van der Vaart and Wellner [39]. Let P0 denote the true distribution of O = (W, A, Y), Pn the empirical distribution of Oi=(Wi,Ai,Yi), i=1,…,n, and Qn=n(Pn−P0) the empirical process. We shall use operator notation for integrals; for instance, we write ϖˆ=PnA and δˆor=Pnψor(O;βˆ,ϖˆ), where

ψor(O;βˆ,ϖˆ)=m(1,W;βˆ)−m(0,W;βˆ)πAϖˆ+(1−π)(1−A)1−ϖˆ.

Asymptotics for δˆor and δ˜or

We assume here that the OR model m(a,w;β) is correct. Under regularity conditions, we have n(βˆ−β)=Qnφβˆ(O)+op(1), where φβˆ(O) is the influence function of βˆ. Let us write

n(δˆor−δ)=n{Pnψor(O;βˆ,ϖˆ)−P0ψor(O;β,ϖ)}=Qnψor(O;βˆ,ϖˆ)−nP0{ψor(O;βˆ,ϖˆ)−ψor(O;β,ϖ)}.

The dominated convergence theorem can be used to show that the mapping of (β, ϖ) to ψor(O;β,ϖ)∈L2(P0) is continuous. This, together with Theorem 19.24 of van der Vaart [38], implies that

Qnψor(O;βˆ,ϖˆ)=Qnψor(O;β,ϖ)+op(1).

By the delta method,

nP0{ψor(O;βˆ,ϖˆ)−ψor(O;β,ϖ)}=dβorTn(βˆ−β)+dϖorn(ϖˆ−ϖ)+op(1)=Qn{dβorTφβˆ(O)+dϖor(A−ϖ)}+op(1),

where dβor=∂P0ψor(O;β,ϖ)/∂β and dϖor=∂P0ψor(O;β,ϖ)/∂ϖ. It follows that

n(δˆor−δ)=Qn{ψor(O;β,ϖ)+dβorTφβˆ(O)+dϖor(A−ϖ)}+op(1).

Because φβˆ(O) is uncorrelated with ψor(O;β,ϖ)+dϖor(A−ϖ), the asymptotic variance of δˆor is

avar(δˆor)=var{ψor(O;β,ϖ)+dϖor(A−ϖ)}+dβorTvar{φβˆ(O)}dβor.

Similarly, it can be shown that

n(δ˜or−δ)=Qn{ψor(O;β,ϖ)+dβorTφβˆ(O)}+op(1)

and

avar(δ˜or)=var{ψor(O;β,ϖ)}+dβorTvar{φβˆ(O)}dβor.

We note that avar(δ˜or)≥avar(δˆor) because A is uncorrelated with ψor(O;β,ϖ)+dϖor(A−ϖ). Indeed, simple algebra shows that

cov{A,ψor(O;β,ϖ)}=E{Aψor(O;β,ϖ)}−ϖδ=E[Aπ{m(1,W;β)−m(0,W;β)}ϖ]−ϖδ=πE{m(1,W;β)−m(0,W;β)|A=1}−ϖδ=πδ(1)−ϖ{πδ(1)+(1−π)δ(0)}=π(1−ϖ)δ(1)−(1−π)ϖδ(0),

dϖor=P0{∂ψor(O;β,ϖ)/∂ϖ}=E[{m(1,W;β)−m(0,W;β)}{(1−π)(1−A)(1−ϖ)2−πAϖ2}]=1−π1−ϖE{m(1,W;β)−m(0,W;β)|A=0}−πϖE{m(1,W;β)−m(0,W;β)|A=1}=(1−π)δ(0)1−ϖ−πδ(1)ϖ,

and therefore

cov{A,dϖor(A−ϖ)}=dϖorvar(A)=dϖorϖ(1−ϖ)=(1−π)ϖδ(0)−π(1−ϖ)δ(1)=−cov{A,ψor(O;β,ϖ)}.

It follows that

avar(δ˜or)−avar(δˆor)=var{ψor(O;β,ϖ)}−var{ψor(O;β,ϖ)+dϖor(A−ϖ)}=var{dϖor(A−ϖ)}≥0.

Asymptotics for δˆwt and δ˜wt

We assume here that the sample PS model ps(W;γs) is correct. Under regularity conditions, we have n(γˆs−γs)=Qnφγˆs(O)+op(1), where φγˆs(O) is the influence function of γˆs. We write δˆwt=Pnψwt(O;γˆs,ϖˆ), where

ψwt(O;γˆs,ϖˆ)=πAYϖˆp(W;γˆs,ϖˆ)−(1−π)(1−A)Y(1−ϖˆ){1−p(W;γˆs,ϖˆ)},

and argue as in the preceding proof that

n(δˆwt−δ)=n{Pnψwt(O;γˆs,ϖˆ)−P0ψwt(O;γs,ϖ)}=Qnψwt(O;γˆs,ϖˆ)+nP0{ψwt(O;γˆs,ϖˆ)−ψwt(O;γs,ϖ)}=Qnψwt(O;γs,ϖ)+dγswtTn(γˆs−γs)+dϖwtn(ϖˆ−ϖ)+op(1)=Qn{ψwt(O;γs,ϖ)+dγswtTφγˆs(O)+dϖwt(A−ϖ)}+op(1),

where dγswt=∂P0ψwt(O;γs,ϖ)/∂γs and dϖwt=∂P0ψwt(O;γs,ϖ)/∂ϖ.

Similarly, for δ˜wt=Pnψwt(O;γˆs,ϖ), we have

n(δ˜wt−δ)=Qn{ψwt(O;γs,ϖ)+dγswtTφγˆs(O)}+op(1).

If the population PS p(w) is fully known and substituted into the weighted estimators, then the term dγswtTφγˆs(O) drops out of the influence functions of both estimators, and avar(δ˜wt)≥avar(δˆwt) because A is uncorrelated with ψwt(O;γs,ϖ)+dϖwt(A−ϖ), which follows from simple algebra as in the analysis comparing δ˜or with δˆor. In general, it is not clear (to us) how avar(δ˜wt) compares with avar(δˆwt).

Asymptotics for δˆdr.p and δ˜dr.p

We write δˆdr.p=Pnψdr.p(O;βˆ,γˆs,ϖˆ), where

ψdr.p(O;βˆ,γˆs,ϖˆ)=πAYϖˆp(W;γˆs,ϖˆ)−(1−π)Am(1,W;βˆ)ϖˆr(W;γˆs,ϖˆ)+(1−π)(1−A)m(1,W;βˆ)1−ϖˆ−(1−π)(1−A)Y(1−ϖˆ){1−p(Wi;γˆs,ϖˆ)}+π(1−A)m(0,W;βˆ)r(W;γˆs,ϖˆ)1−ϖˆ−πAm(0,W;βˆ)ϖˆ.

We assume that βˆ and γˆs converge in probability to some β∗ and γs∗, respectively. If the OR model is correct, then β∗ is the true value of β. If the PS model is correct, then γs∗ is the true value of γs. Under regularity conditions, δˆdr.p converges in probability to P0ψdr.p(O;β∗,γs∗,ϖ).

To demonstrate the DR property of δˆdr.p, we show that P0ψdr.p(O;β∗,γs∗,ϖ)=δ under correct specification of either or both of the OR and PS models. First, assume the OR model is correct so that β∗=β. Simple algebraic manipulations yield

ψdr.p(O;β,γs∗,ϖ)={πAϖ+(1−π)(1−A)1−ϖ}{m(1,W;β)−m(0,W;β)}+A{Y−m(1,W;β)}ϖ{π+1−πr(W;γs∗,ϖ)}−(1−A){Y−m(0,W;β)}1−ϖ{1−π+πr(W;γs∗,ϖ)}.

The first term on the right-hand side has mean δ, as argued in Section 2.2. The definition of m as a conditional mean function implies that the other two terms on the right-hand side have mean 0, regardless of γs∗. Therefore, δˆdr.p is consistent for δ under a correct OR model. Now, assume the PS model is correct, and write

ψdr.p(O;β∗,γs,ϖ)=πAYϖp(W;γs,ϖ)−(1−π)(1−A)Y(1−ϖ){1−p(W;γs,ϖ)}−π{Am(0,W;β∗)ϖ−(1−A)m(0,W;β∗)r(w;γs,ϖ)1−ϖ}−(1−π){Am(1,W;β∗)ϖr(W;γs,ϖ)−(1−A)m(1,W;β∗)1−ϖ}.

The first line on the right-hand side has mean δ, as argued in Section 2.3. The definition of r as a density ratio implies that the other two terms on the right-hand side have mean 0, regardless of β∗. Therefore, δˆdr.p remains consistent for δ under a correct PS model.

Assuming correct specification of one or both of the OR and PS models, it can be argued as before that

n(δˆdr.p−δ)=n{Pnψdr.p(O;βˆ,γˆs,ϖˆ)−P0ψdr.p(O;β∗,γs∗,ϖ)}=Qnψdr.p(O;βˆ,γˆs,ϖˆ)+nP0{ψdr.p(O;βˆ,γˆs,ϖˆ)−ψdr.p(O;β∗,γs∗,ϖ)}=Qnψdr.p(O;β∗,γs∗,ϖ)+dβdr.pTn(βˆ−β∗)+dγsdr.pTn(γˆs−γs∗)+dϖdr.pn(ϖˆ−ϖ)+op(1)=Qn{ψdr.p(O;β∗,γs∗,ϖ)+dβdr.pTφβˆ(O)+dγsdr.pTφγˆs(O)+dϖdr.p(A−ϖ)}+op(1),

where

dβdr.p=∂P0ψdr.p(O;β∗,γs∗,ϖ)/∂β∗,dγsdr.p=∂P0ψdr.p(O;β∗,γs∗,ϖ)/∂γs∗,dϖdr.p=∂P0ψdr.p(O;β∗,γs∗,ϖ)/∂ϖ.

The DR property of δˆdr.p implies that dβdr.p=0 if the PS model is correct and that dγsdr.p=0 if the OR model is correct. If both models are correct, then dβdr.p and dγsdr.p are both zero, and it follows from simple algebra that dϖdr.p=−τ. Thus, δˆdr.p is locally efficient in the sense that

n(δˆdr.p−δ)=Qnϕeff(O)+op(1)

if the OR and PS models are both correct.

Similarly, it can be shown that

n(δ˜dr.p−δ)=Qn{ψdr.p(O;β∗,γs∗,ϖ)+dβdr.pTφβˆ(O)+dγsdr.pTφγˆs(O)−τ∗(β∗)(A−ϖ)}+op(1),

where

τ∗(β∗)=E[{πAϖ2−(1−π)(1−A)(1−ϖ)2}{m(1,W;β∗)−m(0,W;β∗)}].

If the OR model is correct, β∗ equals the true value of β and τ∗(β)=τ. If the OR and PS models are both correct,

n(δ˜dr.p−δ)=Qnϕeff(O)+op(1).

Asymptotics for δˆdr.np

We write δˆdr.np=Pnψdr.np(O;mˆ,pˆs,ϖˆ), where

ψdr.np(O;mˆ,pˆs,ϖˆ)=πAiYiϖˆpˆ(Wi)−(1−π)Aimˆ(1,Wi)ϖˆrˆ(Wi)+(1−π)(1−Ai)mˆ(1,Wi)1−ϖˆ−(1−π)(1−Ai)Yi(1−ϖˆ){1−pˆ(Wi)}+π(1−Ai)mˆ(0,Wi)rˆ(Wi)1−ϖˆ−πAimˆ(0,Wi)ϖˆ,

where pˆ and rˆ are derived from pˆs through equation (3). We assume that ∥mˆ−m∗∥2=op(1) and ∥pˆs−ps∗∥2=op(1) for some limit functions m∗ and ps∗. Under regularity conditions, δˆdr.np converges in probability to P0ψdr.np(O;m∗,ps∗,ϖ). With minor changes in notation, the arguments for the DR property of δˆdr.p can be used to show that P0ψdr.np(O;m∗,ps∗,ϖ)=δ if m∗=m or ps∗=ps (or both).

In what follows, we demonstrate that

n(δˆdr.np−δ)=Qnϕeff(O)+op(1),

assuming that condition (9) holds and that ψdr.np(O;mˆ,pˆs,ϖˆ) belongs to a Donsker class with probability tending to 1. It can be argued as before that

n(δˆdr.np−δ)=Qnψdr.np(O;mˆ,pˆs,ϖˆ)+nP0{ψdr.np(O;mˆ,pˆs,ϖˆ)−ψdr.np(O;m,ps,ϖ)}=Qnψdr.np(O;m,ps,ϖ)+nP0{ψdr.np(O;mˆ,pˆs,ϖˆ)−ψdr.np(O;mˆ,pˆs,ϖ)}+nRn+op(1)=Qnψdr.np(O;m,ps,ϖ)−τn(ϖˆ−ϖ)+nRn+op(1)=Qnϕeff(O)+nRn+op(1),

where Rn=P0{ψdr.np(O;mˆ,pˆs,ϖ)−ψdr.np(O;m,ps,ϖ)}. So it suffices to show that Rn=op(n−1/2). To this end, we write ψdr.np(O;m,ps,ϖ)=ψdr.np(1)(O;m,ps,ϖ)−ψdr.np(0)(O;m,ps,ϖ), where

ψdr.np(1)(O;m,ps,ϖ)=πAYϖp(W)−(1−π)Am(1,W)ϖr(W)+(1−π)(1−A)m(1,W)1−ϖ,ψdr.np(0)(O;m,ps,ϖ)=(1−π)(1−A)Y(1−ϖ){1−p(W)}−π(1−A)m(0,W)r(W)1−ϖ+πAm(0,W)ϖ.

Simple algrbraic manipulations yield

P0{ψdr.np(1)(O;mˆ,pˆs,ϖ)−ψdr.np(1)(O;m,ps,ϖ)}=P0[{mˆ(1,W)−m(1,W)}{πAϖ+(1−π)(1−A)1−ϖ−πAϖpˆ(W)}]=P0(P0[{mˆ(1,W)−m(1,W)}{πAϖ+(1−π)(1−A)1−ϖ−πAϖpˆ(W)}|W])=P0({mˆ(1,W)−m(1,W)}[πps(W)ϖ+(1−π){1−ps(W)}1−ϖ−πps(W)ϖpˆ(W)])=P0[{mˆ(1,W)−m(1,W)}πps(W){pˆ(W)−p(W)}ϖp(W)pˆ(W)].

It follows from the Cauchy-Schwartz inequality that

P0{ψdr.np(1)(O;mˆ,pˆs,ϖ)−ψdr.np(1)(O;m,ps,ϖ)}=Op(∥mˆ(1,⋅)−m(1,⋅)∥2∥pˆ−p∥2),

where

∥mˆ(1,⋅)−m(1,⋅)∥22=∫{mˆ(1,w)−m(1,w)}2{ϖf1(w)+(1−ϖ)f0(w)}dν(w).

Comparing ∥mˆ(1,⋅)−m(1,⋅)∥2 with ∥mˆ−m∥2 shows that

∥mˆ(1,⋅)−m(1,⋅)∥2≤ ∥mˆ−m∥2essinfwps(w).

Assumption (2) and equation (3) together imply that ∥pˆ−p∥2=Op(∥pˆs−ps∥2). It follows that

P0{ψdr.np(1)(O;mˆ,pˆs,ϖ)−ψdr.np(1)(O;m,ps,ϖ)}=Op(∥mˆ−m∥2∥pˆs−ps∥2).

Similarly, it can be shown that

P0{ψdr.np(0)(O;mˆ,pˆs,ϖ)−ψdr.np(0)(O;m,ps,ϖ)}=Op(∥mˆ−m∥2∥pˆs−ps∥2).

Add the last two equations, we obtain Rn=Op(∥mˆ−m∥2∥pˆs−ps∥2), and the proof is complete upon invoking condition (9).

The same arguments can be used to show that δ˜dr.np is consistent if either ∥mˆ−m∥2=op(1) or ∥pˆs−ps∥2=op(1) (or both), and that

n(δ˜dr.np−δ)=Qnϕeff(O)+op(1),

under condition (9) and the Donsker condition stated earlier.

C Technical Details for Section 2.6

Alternative OR Estimators

The alternative OR estimators are given by

δˆor†=1n∑i=1n[{m(1,Wi;βˆ†)−m(0,Wi;βˆ†)}{πAiϖˆ+(1−π)(1−Ai)1−ϖˆ}],δ˜or†=1n∑i=1n[{m(1,Wi;β˜†)−m(0,Wi;β˜†)}{πAiϖ+(1−π)(1−Ai)1−ϖ}].

Under a correct OR model, it can be shown as before that

n(δˆor†−δ)=Qn{ψor(O;β,ϖ)+dβorTφβˆ†(O)+dϖor(A−ϖ)}+op(1),n(δ˜or†−δ)=Qn{ψor(O;β,ϖ)+dβorTφβ˜†(O)}+op(1).

where φβˆ†(O) and φβ˜†(O) are the influence functions of βˆ† and β˜†, respectively. Because φβˆ†(O) and φβ˜†(O) are both uncorrelated with ψor(O;β,ϖ) and A−ϖ, we have

avar(δˆor†)=var{ψor(O;β,ϖ)+dϖor(A−ϖ)}+dβorTvar{φβˆ†(O)}dβor,avar(δ˜or†)=var{ψor(O;β,ϖ)}+dβorTvar{φβ˜†(O)}dβor.

If var{φβˆ†(O)}−var{φβˆ(O)} is non-negative definite, then avar(δˆor†)≥avar(δˆor). Likewise, if var{φβ˜†(O)}−var{φβˆ(O)} is non-negative definite, then avar(δ˜or†)≥avar(δ˜or). As shown in Appendix B, A is uncorrelated with ψor(O;β,ϖ)+dϖor(A−ϖ), which implies

var{ψor(O;β,ϖ)}≥var{ψor(O;β,ϖ)+dϖor(A−ϖ)},

and therefore

avar(δ˜or†)−avar(δˆor†)≥dβorTvar{φβ˜†(O)}dβor−dβorTvar{φβˆ†(O)}dβor.

Given the similarity of βˆ† and β˜†, it is likely that var{φβ˜†(O)}=var{φβˆ†(O)}, in which case avar(δ˜or†)≥avar(δˆor†).

Justification for Equation (11)

Simple algebraic manipulations yield

E[{πAϖ+(1−π)(1−A)1−ϖ}{A−p(W)}|W]=E{πAϖ−πAp(W)ϖ−(1−π)(1−A)p(W)1−ϖ|W}=πps(W)ϖ−πps(W)p(W)ϖ−(1−π){1−ps(W)}p(W)1−ϖ=πps(W){1−p(W)}ϖ−(1−π){1−ps(W)}p(W)1−ϖ.

The last line is identically 0 because

(1−ϖ)ps(W)ϖ{1−ps(W)}=(1−π)p(W)π{1−p(W)},

which follows from equation (3).

Alternative Weighted Estimators

The alternative weighted estimators are given by

δˆwt†=πn1∑i=1nAiYip(Wi;γˆ†)−1−πn0∑i=1n(1−Ai)Yi1−p(Wi;γˆ†),δ˜wt†=1n∑i=1n[πAiYiϖp(Wi;γ˜†)−(1−π)(1−Ai)Yi(1−ϖ){1−p(Wi;γ˜†)}].

Assume the population PS model is correct, and define

ψwt†(O;γ,ϖ)=πAYϖp(W;γ)−(1−π)(1−A)Y(1−ϖ){1−p(W;γ)}.

Then δˆwt†=Pnψwt†(O;γˆ†,ϖˆ), δ˜wt†=Pnψwt†(O;γ˜†,ϖ), and it can be shown as before that

n(δˆwt†−δ)=Qn{ψwt†(O;γ,ϖ)+dγwt†Tφγˆ†(O)+dϖwt†(A−ϖ)}+op(1),n(δ˜wt†−δ)=Qn{ψwt†(O;γ,ϖ)+dγwt†Tφγ˜†(O)}+op(1),

where dγwt†=∂P0ψwt†(O;γ,ϖ)/∂γ, dϖwt†=∂P0ψwt†(O;γ,ϖ)/∂ϖ, and φγˆ†(O) and φγ˜†(O) are the influence functions of γˆ† and γ˜†, respectively. It is not clear how these estimators compare with δˆwt and δ˜wt in terms of asymptotic variance.

Alternative DR Estimators

The alternative DR.P estimators can be derived as follows. Without using sampling weights, a naive DR estimator of δ would be

Pn{AYp(W;γˆ)−A−p(W;γˆ)p(W;γˆ)m(1,W;βˆ)−(1−A)Y1−p(W;γˆ)−A−p(W;γˆ)1−p(W;γˆ)m(0,W;βˆ)},

which can be rearranged as

Pn[A{Yp(W;γˆ)−m(1,W;βˆ)p(W;γˆ)+m(1,W;βˆ)−m(0,W;βˆ)}−(1−A){Y1−p(W;γˆ)−m(0,W;βˆ)1−p(W;γˆ)−m(1,W;βˆ)+m(0,W;βˆ)}].

Replacing (βˆ,γˆ) with weighted estimates and applying sampling weights to individual subjects, we then obtain δˆdr.p†=Pnψdr.p†(O;βˆ†,γˆ†,ϖˆ), δ˜dr.p†=Pnψdr.p†(O;β˜†,γ˜†,ϖ), where

ψdr.p†(O;β,γ,ϖ)=πAYϖp(W;γ)−(1−π)Am(1,W;β)ϖr(W;γ)+(1−π)(1−A)m(1,W;β)1−ϖ−(1−π)(1−A)Y(1−ϖ){1−p(W;γ)}+π(1−A)m(0,W;β)r(W;γ)1−ϖ−πAm(0,W;β)ϖ.

Assuming correct specification of one or both of the OR and population PS models, it can be argued as before that

n(δˆdr.p†−δ)=Qn{ψdr.p†(O;β∗,γ∗,ϖ)+dβdr.p†Tφβˆ†(O)+dγdr.p†Tφγˆ†(O)+dϖdr.p†(A−ϖ)}+op(1),n(δ˜dr.p†−δ)=Qn{ψdr.p†(O;β∗,γ∗,ϖ)+dβdr.p†Tφβ˜†(O)+dγdr.p†Tφγ˜†(O)}+op(1),

where (β∗,γ∗) denote the probability limits of (βˆ†,γˆ†) (for δˆdr.p†) or (β˜†,γ˜†) (for δ˜dr.p†), and

dβdr.p†=∂P0ψdr.p†(O;β∗,γ∗,ϖ)/∂β∗,dγdr.p†=∂P0ψdr.p†(O;β∗,γ∗,ϖ)/∂γ∗,dϖdr.p†=∂P0ψdr.p†(O;β∗,γ∗,ϖ)/∂ϖ.

If both models are correct, then

n(δˆdr.p†−δ)=Qn{ψdr.p†(O;β,γ,ϖ)−τ(A−ϖ)}+op(1)=Qnϕeff(O)+op(1),n(δ˜dr.p†−δ)=Qn{ψdr.p†(O;β,γ,ϖ)}+op(1).

Note that δˆdr.p† is locally efficient but δ˜dr.p† is not. Subtracting Pn{(A−ϖ)τ˜(β˜†)} from δ˜dr.p† would make it locally efficient.

Following similar arguments, alternative DR.NP estimators can be obtained as

δˆdr.np†=Pn[πAYϖˆpˆ†(W)−(1−π)Amˆ†(1,W)ϖˆrˆ†(W)+(1−π)(1−A)mˆ†(1,W)1−ϖˆ−(1−π)(1−A)Y(1−ϖˆ){1−pˆ†(W)}+π(1−A)mˆ†(0,W)rˆ†(W)1−ϖˆ−πAmˆ†(0,W)ϖˆ],δ˜dr.np†=Pn[πAYϖp˜†(W)−(1−π)Am˜†(1,W)ϖr˜†(W)+(1−π)(1−A)m˜†(1,W)1−ϖ−(1−π)(1−A)Y(1−ϖ){1−p˜†(W)}+π(1−A)m˜†(0,W)r˜†(W)1−ϖ−πAm˜†(0,W)ϖ],

where (mˆ†,pˆ†) are obtained using (10) as sampling weights, (m˜†,p˜†) are obtained using sampling weights based on ϖ, and (rˆ†,r˜†) are converted from (pˆ†,p˜†) through equation (3). The asymptotic results for δˆdr.np are readily applicable to δˆdr.np† with obvious changes in technical conditions. This is not the case for δ˜dr.np†, which is not efficient even under suitable rate and Donsker conditions, missing the term −(A−ϖ)τ in its influence function. Subtracting Pn{(A−ϖ)τ˜(m˜†)} from δ˜dr.np† would make it locally efficient.

The alternative DR.SS estimators are similar to the alternative DR.NP estimators except for the use of sample splitting. They also behave similarly to the (respective) alternative DR.NP estimators. A detailed discussion is omitted.

D Additional Simulation Results

Figures 6–11 present the additional simulation results mentioned but not reported in Section 3.1.

$Figure 6: Estimation error (estimator minus true value of δ) distributions for (δ˜or,δ˜wt,δ˜dr.p,δ˜dr.np,δ˜dr.ss)(\widetilde\delta_{\text{or}},\widetilde\delta_{\text{wt}},\widetilde\delta_{\text{dr.p}},\widetilde\delta_{\text{dr.np}},\widetilde\delta_{\text{dr.ss}}) in the simulation study of Section 3.1. WT denotes the weighted method. The blue horizontal dotted line corresponds to 0, and the red horizontal dashed line to the sampling bias.$

Figure 6:

Estimation error (estimator minus true value of δ) distributions for (δ˜or,δ˜wt,δ˜dr.p,δ˜dr.np,δ˜dr.ss) in the simulation study of Section 3.1. WT denotes the weighted method. The blue horizontal dotted line corresponds to 0, and the red horizontal dashed line to the sampling bias.

$Figure 7: Estimation error (estimator minus true value of δ) distributions for (δˆor†,δˆwt†,δˆdr.p†,δˆdr.np†,δˆdr.ss†)(\widehat\delta_{\text{or}}^{\dagger},\widehat\delta_{\text{wt}}^{\dagger},\widehat\delta_{\text{dr.p}}^{\dagger},\widehat\delta_{\text{dr.np}}^{\dagger},\widehat\delta_{\text{dr.ss}}^{\dagger}) in the simulation study of Section 3.1. WT denotes the weighted method. The blue horizontal dotted line corresponds to 0, and the red horizontal dashed line to the sampling bias.$

Figure 7:

Estimation error (estimator minus true value of δ) distributions for (δˆor†,δˆwt†,δˆdr.p†,δˆdr.np†,δˆdr.ss†) in the simulation study of Section 3.1. WT denotes the weighted method. The blue horizontal dotted line corresponds to 0, and the red horizontal dashed line to the sampling bias.

$Figure 8: Estimation error (estimator minus true value of δ) distributions for (δ˜or†,δ˜wt†,δ˜dr.p†,δ˜dr.np†,δ˜dr.ss†)(\widetilde\delta_{\text{or}}^{\dagger},\widetilde\delta_{\text{wt}}^{\dagger},\widetilde\delta_{\text{dr.p}}^{\dagger},\widetilde\delta_{\text{dr.np}}^{\dagger},\widetilde\delta_{\text{dr.ss}}^{\dagger}) in the simulation study of Section 3.1. WT denotes the weighted method. The blue horizontal dotted line corresponds to 0, and the red horizontal dashed line to the sampling bias.$

Figure 8:

Estimation error (estimator minus true value of δ) distributions for (δ˜or†,δ˜wt†,δ˜dr.p†,δ˜dr.np†,δ˜dr.ss†) in the simulation study of Section 3.1. WT denotes the weighted method. The blue horizontal dotted line corresponds to 0, and the red horizontal dashed line to the sampling bias.

$Figure 9: Coverage proportions for (δ˜or,δ˜wt,δ˜dr.p,δ˜dr.np,δ˜dr.ss)(\widetilde\delta_{\text{or}},\widetilde\delta_{\text{wt}},\widetilde\delta_{\text{dr.p}},\widetilde\delta_{\text{dr.np}},\widetilde\delta_{\text{dr.ss}}) in the simulation study of Section 3.1. WT denotes the weighted method. The blue horizontal dotted line corresponds to 0.95, the nominal level. An empty bar indicates that the coverage proportion is below the plotting range.$

Figure 9:

Coverage proportions for (δ˜or,δ˜wt,δ˜dr.p,δ˜dr.np,δ˜dr.ss) in the simulation study of Section 3.1. WT denotes the weighted method. The blue horizontal dotted line corresponds to 0.95, the nominal level. An empty bar indicates that the coverage proportion is below the plotting range.

$Figure 10: Coverage proportions for (δˆor†,δˆwt†,δˆdr.p†,δˆdr.np†,δˆdr.ss†)(\widehat\delta_{\text{or}}^{\dagger},\widehat\delta_{\text{wt}}^{\dagger},\widehat\delta_{\text{dr.p}}^{\dagger},\widehat\delta_{\text{dr.np}}^{\dagger},\widehat\delta_{\text{dr.ss}}^{\dagger}) in the simulation study of Section 3.1. WT denotes the weighted method. The blue horizontal dotted line corresponds to 0.95, the nominal level. An empty bar indicates that the coverage proportion is below the plotting range.$

Figure 10:

Coverage proportions for (δˆor†,δˆwt†,δˆdr.p†,δˆdr.np†,δˆdr.ss†) in the simulation study of Section 3.1. WT denotes the weighted method. The blue horizontal dotted line corresponds to 0.95, the nominal level. An empty bar indicates that the coverage proportion is below the plotting range.

$Figure 11: Coverage proportions for (δ˜or†,δ˜wt†,δ˜dr.p†,δ˜dr.np†,δ˜dr.ss†)(\widetilde\delta_{\text{or}}^{\dagger},\widetilde\delta_{\text{wt}}^{\dagger},\widetilde\delta_{\text{dr.p}}^{\dagger},\widetilde\delta_{\text{dr.np}}^{\dagger},\widetilde\delta_{\text{dr.ss}}^{\dagger}) in the simulation study of Section 3.1. WT denotes the weighted method. The blue horizontal dotted line corresponds to 0.95, the nominal level. An empty bar indicates that the coverage proportion is below the plotting range.$

Figure 11:

Coverage proportions for (δ˜or†,δ˜wt†,δ˜dr.p†,δ˜dr.np†,δ˜dr.ss†) in the simulation study of Section 3.1. WT denotes the weighted method. The blue horizontal dotted line corresponds to 0.95, the nominal level. An empty bar indicates that the coverage proportion is below the plotting range.

E R Code

library(SuperLearner) logit = function(p) log(p/(1-p)) expit = function(u) 1/(1+exp(-u)) # outcome regression method based on a parametric OR model # W does not have one as the first column # if non-zero, inter represents the columns of W that interact with a est.or = function(y, a, W, pi, inter=0, family=gaussian) { n = length(y) treated = (a>0.5); control = !treated one = rep(1, n); zero = rep(0, n) D = cbind(one, a, W) if (inter[1]!=0) D = cbind(D, a*W[,inter]) or.mod = glm(y^∼0+D, family=family) D1 = cbind(one, one, W) if (inter[1]!=0) D1 = cbind(D1, one*W[,inter]) m1 = predict(or.mod, newdata=list(D=D1), type="response") mu1 = pi*mean(m1[treated])+(1-pi)*mean(m1[control]) D0 = cbind(one, zero, W) if (inter[1]!=0) D0 = cbind(D0, zero*W[,inter]) m0 = predict(or.mod, newdata=list(D=D0), type="response") mu0 = pi*mean(m0[treated])+(1-pi)*mean(m0[control]) c(mu1, mu0, mu1-mu0) } # weighted method based on a parametric PS model # W does not have one as the first column est.wtd = function(y, a, W, pi, family=binomial) { treated = (a>0.5); control = !treated pi.s = mean(treated) ps.mod = glm(a^∼W, family=family) ps.s = ps.mod$fitted.values # PS in the sample ps = expit(logit(ps.s)-logit(pi.s)+logit(pi)) # PS in the population w1 = 1/ps[treated]; w0 = 1/(1-ps[control]) mu1 = pi*mean(w1*y[treated]) mu0 = (1-pi)*mean(w0*y[control]) c(mu1, mu0, mu1-mu0) } # DR.P method based on parametric OR and PS models # W.ps and W.or do not have one as the first column # if non-zero, inter represents the columns of W.or # that interact with a in the OR model est.dr1 = function(y, a, W.ps, W.or, pi, inter.or=0, family.or=gaussian, family.ps=binomial) { n = length(y) treated = (a>0.5); control = !treated pi.s = mean(treated) one = rep(1, n); zero = rep(0, n) D = cbind(one, a, W.or) if (inter.or[1]!=0) D = cbind(D, a*W.or[,inter.or]) or.mod = glm(y^∼0+D, family=family.or) D1 = cbind(one, one, W.or) if (inter.or[1]!=0) D1 = cbind(D1, one*W.or[,inter.or]) m1 = predict(or.mod, newdata=list(D=D1), type="response") D0 = cbind(one, zero, W.or) if (inter.or[1]!=0) D0 = cbind(D0, zero*W.or[,inter.or]) m0 = predict(or.mod, newdata=list(D=D0), type="response") ps.mod = glm(a^∼W.ps, family=family.ps) ps.s = ps.mod$fitted.values ps = expit(logit(ps.s)-logit(pi.s)+logit(pi)) r = exp(logit(ps.s)-logit(pi.s)) mu1 = mean((pi*y[treated]/ps[treated])-((1-pi)*m1[treated]/r[treated]))+ (1-pi)*mean(m1[control]) mu0 = mean(((1-pi)*y[control]/(1-ps[control]))-pi*m0[control]*r[control])+ pi*mean(m0[treated]) c(mu1, mu0, mu1-mu0) } # DR.NP method based on super learner estimates of OR and PS functions # cv is the number of folds in the super learners est.dr2 = function(y, a, W.ps, W.or, pi, SL.lib.or, SL.lib.ps, family.or=gaussian(), family.ps=binomial(), cv=5) { n = length(y) treated = (a>0.5); control = !treated pi.s = mean(treated) SL.run0 = SuperLearner(Y=y[control], X=W.or[control,], newX=W.or, family=family.or, SL.library=SL.lib.or, cvControl=list(V=cv)) m0 = SL.run0$SL.predict mu0.1 = mean(m0[treated]); mu0.0 = mean(m0[control]) SL.run1 = SuperLearner(Y=y[treated], X=W.or[treated,], newX=W.or, family=family.or, SL.library=SL.lib.or, cvControl=list(V=cv)) m1 = SL.run1$SL.predict mu1.1 = mean(m1[treated]); mu1.0 = mean(m1[control]) SL.run2 = SuperLearner(Y=a, X=W.ps, family=family.ps, SL.library=SL.lib.ps, cvControl=list(V=cv)) ps.s = SL.run2$SL.predict ps = expit(logit(ps.s)-logit(pi.s)+logit(pi)) r = exp(logit(ps.s)-logit(pi.s)) tau1 = (pi*mu1.1/pi.s)-((1-pi)*mu1.0/(1-pi.s)) tau0 = (pi*mu0.1/pi.s)-((1-pi)*mu0.0/(1-pi.s)) psi1 = (pi*a*y/(pi.s*ps))-((1-pi)*a*m1/(pi.s*r))+ ((1-pi)*(1-a)*m1/(1-pi.s))-(a-pi.s)*tau1 psi0 = ((1-pi)*(1-a)*y/((1-pi.s)*(1-ps)))-(pi*(1-a)*m0*r/(1-pi.s))+ (pi*a*m0/pi.s)-(a-pi.s)*tau0 psi = psi1-psi0; psi.mat = cbind(psi1,psi0,psi) pe = colMeans(psi.mat); se = sqrt(diag(var(psi.mat))/n) cbind(pe,se) } # DR.SS method based on sample splitting and # super learner estimates of OR and PS functions # K is the number of folds in sample splitting # cv is the number of folds in the super learners est.dr3 = function(y, a, W.ps, W.or, pi, SL.lib.or, SL.lib.ps, family.or=gaussian(), family.ps=binomial(), cv=5, K=5) { n = length(y); m0 = rep(NA,n); m1 = m0; ps.s = m0 treated = (a>0.5); control = !treated pi.s = mean(treated) st = sample(1:K, n, replace=TRUE) # stratum in sample splitting for (k in 1:K) { val = (st==k); trn = !val trn.treated = trn&treated; trn.control = trn&control SL.run0 = SuperLearner(Y=y[trn.control], X=W.or[trn.control,], newX=W.or[val,], family=family.or, SL.library=SL.lib.or, cvControl=list(V=cv)) m0[val] = SL.run0$SL.predict SL.run1 = SuperLearner(Y=y[trn.treated], X=W.or[trn.treated,], newX=W.or[val,], family=family.or, SL.library=SL.lib.or, cvControl=list(V=cv)) m1[val] = SL.run1$SL.predict SL.run2 = SuperLearner(Y=a[trn], X=W.ps[trn,], newX=W.ps[val,], family=family.ps, SL.library=SL.lib.ps, cvControl=list(V=cv)) ps.s[val] = SL.run2$SL.predict } mu0.1 = mean(m0[treated]); mu0.0 = mean(m0[control]) mu1.1 = mean(m1[treated]); mu1.0 = mean(m1[control]) ps = expit(logit(ps.s)-logit(pi.s)+logit(pi)) r = exp(logit(ps.s)-logit(pi.s)) tau1 = (pi*mu1.1/pi.s)-((1-pi)*mu1.0/(1-pi.s)) tau0 = (pi*mu0.1/pi.s)-((1-pi)*mu0.0/(1-pi.s)) psi1 = (pi*a*y/(pi.s*ps))-((1-pi)*a*m1/(pi.s*r))+ ((1-pi)*(1-a)*m1/(1-pi.s))-(a-pi.s)*tau1 psi0 = ((1-pi)*(1-a)*y/((1-pi.s)*(1-ps)))-(pi*(1-a)*m0*r/(1-pi.s))+ (pi*a*m0/pi.s)-(a-pi.s)*tau0 psi = psi1-psi0; psi.mat = cbind(psi1,psi0,psi) pe = colMeans(psi.mat); se = sqrt(diag(var(psi.mat))/n) cbind(pe,se) }

References

[1] Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66:688–701.10.1037/h0037350Suche in Google Scholar

[2] Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.10.21236/ADA114514Suche in Google Scholar

[3] Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology. 2000;11:550–60.10.1097/00001648-200009000-00011Suche in Google Scholar PubMed

[4] Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc. 1984;79:516–24.10.1017/CBO9780511810725.018Suche in Google Scholar

[5] Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat. 1985;39:33–8.10.1017/CBO9780511810725.019Suche in Google Scholar

[6] Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005;61:962–72.10.1111/j.1541-0420.2005.00377.xSuche in Google Scholar PubMed

[7] Cao W, Tsiatis AA, Davidian M. Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika. 2009;96;723–34.10.1093/biomet/asp033Suche in Google Scholar PubMed PubMed Central

[8] Rotnitzky A, Lei Q, Sued M, Robins JM. Improved double-robust estimation in missing data and causal inference models. Biometrika. 2012;99:439–56.10.1093/biomet/ass013Suche in Google Scholar PubMed PubMed Central

[9] Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for nonignorable drop-out using semiparametric nonresponse mopdels (with discussion). J Am Stat Assoc. 1999;94:1096–146.10.1080/01621459.1999.10473862Suche in Google Scholar

[10] Tan Z. Bounded, efficient, and doubly robust estimation with inverse weighting. Biometrika. 2010;97:661–82.10.1093/biomet/asq035Suche in Google Scholar

[11] van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. New York: Spring-Verlag, 2003.10.1007/978-0-387-21700-0Suche in Google Scholar

[12] van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data. New York: Springer, 2011.10.1007/978-1-4419-9782-1Suche in Google Scholar

[13] Imai K, King G, Stuart EA. Misunderstandings between experimentalists and observationalists about causal inference. J Royal Stat Soc Ser A (Stat Soc). 2008;171:481–502.10.12987/9780300199307-010Suche in Google Scholar

[14] Wang W, Scharfstein D, Tan Z, MacKenzie EJ. Causal inference in outcome-dependent two-phase sampling designs. J Royal Stat Soc Ser B (Stat Method). 2009;71;947–69.10.1111/j.1467-9868.2009.00712.xSuche in Google Scholar

[15] Nie L, Zhang Z, Rubin D, Chu J. Likelihood reweighting methods to reduce potential bias in noninferiority trials which rely on historical data to make inference. Ann Appl Stat. 2013;7:1796–813.10.1214/13-AOAS655Suche in Google Scholar

[16] Zhang Z, Nie L, Soon G, Hu Z. New methods for treatment effect calibration, with applications to non-inferiority trials. Biometrics. 2016;72;20–9.10.1111/biom.12388Suche in Google Scholar PubMed

[17] Hu Z, Qin J. Generalizability of causal inference in observational studies under retrospective convenience sampling. Stat Med. 2018;37:2874–83.10.1002/sim.7808Suche in Google Scholar PubMed

[18] Heckman JT, Todd PE. A note on adapting propensity score matching and selection models to choice based samples. Econom J. 2009;12:S230–4.10.3386/w15179Suche in Google Scholar

[19] Kennedy EH, Sjolander A, Small DS. Semiparametric causal inference in matched cohort studies. Biometrika. 2015;102:739–46.10.1093/biomet/asv025Suche in Google Scholar

[20] Bickel PJ, Klaassen CA, Ritov Y, Wellner JA. Efficient and adaptive estimation for semiparametric models. Baltimore, MD: Johns Hopkins University Press, 1993.Suche in Google Scholar

[21] Tsiatis AA. Semiparametric theory and missing data. New York: Springer, 2006.Suche in Google Scholar

[22] Hahn J. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica. 1998;66:315–31.10.2307/2998560Suche in Google Scholar

[23] Shinozaki T, Matsuyama Y. Doubly robust estimation of standardized risk difference and ratio in the exposed population. Epidemiology. 2015;26:873–7.10.1097/EDE.0000000000000363Suche in Google Scholar PubMed

[24] Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction, 2nd ed. New York: Springer-Verlag, 2009.10.1007/978-0-387-84858-7Suche in Google Scholar

[25] Benkeser D, van der Laan MJ. The highly adaptive lasso estimator. In: Proceedings of the International Conference on Data Science and Advanced Analytics, 2016:689–96.10.1109/DSAA.2016.93Suche in Google Scholar PubMed PubMed Central

[26] Chen X, White H. Improved rates and asymptotic normality for nonparametric neural network estimators. IEEE Trans Inf Theory. 1999;45:682–91.10.1109/18.749011Suche in Google Scholar

[27] Kennedy EH. Nonparametric causal effects based on incremental propensity score interventions. J Am Stat Assoc. in press, 2018. doi: 10.1080/01621459.2017.1422737.doi: 10.1080/01621459.2017.1422737Suche in Google Scholar

[28] Ma S, Zhu L, Zhang Z, Tsai CL, Carroll RJ. A robust and efficient approach to causal inference based sparse sufficient dimension reduction. Ann Stat. 2019;47:1505–35.10.1214/18-AOS1722Suche in Google Scholar PubMed PubMed Central

[29] van der Laan MJ. A generally efficient targeted minimum loss based estimator based on the highly adaptive lasso. Int J Biostat. 2017;13. DOI: 10.1515/ijb-2015-0097.Suche in Google Scholar PubMed PubMed Central

[30] Polley EC, Rose S, van der Laan MJ. Super learning. In: van der Laan MJ, Rose S (eds.). Targeted learning. New York: Springer, 2011:43–66.10.1007/978-1-4419-9782-1_3Suche in Google Scholar

[31] van der Laan MJ, Polley EC, Hubbard AE. Super Learner. Stat Appl Genet Mol Biol. 2007;6, Article 5.Suche in Google Scholar

[32] van der Laan MJ, Dudoit S. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: finite sample oracle inequalities and examples. UC Berkeley Division of Biostatistics Working Paper Series, paper 130, 2003. http://biostats.bepress.com/ucbbiostat/paper130.Suche in Google Scholar

[33] Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey WK. Double machine learning for treatment and structural parameters. Technical report, cemmap working paper, Centre for Microdata Methods and Practice, 2016.10.1920/wp.cem.2016.4916Suche in Google Scholar

[34] Zheng W, van der Laan MJ. Cross-validated targeted minimum-loss-based estimation. In: van der Laan MJ, Rose S (eds.). Targeted learning. New York: Springer, 2011:459–74.10.1007/978-1-4419-9782-1_27Suche in Google Scholar

[35] Zhang J, Troendle J, Reddy UM, Laughon SK, Branch DW, Burkman R. Contemporary cesarean delivery practice in the United States. Am J Obstetrics Gynecol. 2010;203:326. e1–10.10.1016/j.ajog.2010.06.058Suche in Google Scholar PubMed PubMed Central

[36] Benkeser D, Carone M, van der Laan MJ, Gilbert PB. Doubly robust nonparametric inference on the average treatment effect. Biometrika. 2017;104:863–80.10.1093/biomet/asx053Suche in Google Scholar PubMed PubMed Central

[37] Berger RL, Boos DD. P values maximized over a confidence set for the nuisance parameter. J Am Stat Assoc. 1994;89:1012–6.10.1080/01621459.1994.10476836Suche in Google Scholar

[38] van der Vaart AW. Asymptotic statistics. Cambridge, UK: Cambridge University Press, 1998.10.1017/CBO9780511802256Suche in Google Scholar

[39] van der Vaart AW, Wellner JA. Weak convergence and empirical processes with applications to statistics. New York: Springer-Verlag, 1996.10.1007/978-1-4757-2545-2Suche in Google Scholar

[40] Breiman L, Friedman J, Olshen R, Stone C. Classification and regression trees. New York: Wadsworth, 1984.Suche in Google Scholar

[41] Hastie TJ, Tibshirani RJ. Generalized additive models. New York: Chapman & Hall/CRC, 1990.Suche in Google Scholar

Received: 2018-09-11

Revised: 2019-03-03

Accepted: 2019-04-02

Published Online: 2019-04-16

Sie haben derzeit keinen Zugang zu diesem Inhalt.

Artikel in diesem Heft

https://doi.org/10.1515/ijb-2018-0093

Schlagwörter für diesen Artikel

causal inference; double robustness; efficient influence function; machine learning; semiparametric theory; super learner