Bias formulas for violations of proximal identification assumptions in a linear structural equation model

Raluca Cobzaru; Roy Welsch; Stan Finkelstein; Kenney Ng; Zach Shahn

doi:10.1515/jci-2023-0039

Artikel Open Access

Bias formulas for violations of proximal identification assumptions in a linear structural equation model

Raluca Cobzaru , Roy Welsch , Stan Finkelstein , Kenney Ng und Zach Shahn

Veröffentlicht/Copyright: 19. Juni 2024

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Journal of Causal Inference Band 12 Heft 1

Abstract

Causal inference from observational data often rests on the unverifiable assumption of no unmeasured confounding. Recently, Tchetgen Tchetgen and colleagues have introduced proximal inference to leverage negative control outcomes and exposures as proxies to adjust for bias from unmeasured confounding. However, some of the key assumptions that proximal inference relies on are themselves empirically untestable. In addition, the impact of violations of proximal inference assumptions on the bias of effect estimates is not well understood. In this article, we derive bias formulas for proximal inference estimators under a linear structural equation model. These results are a first step toward sensitivity analysis and quantitative bias analysis of proximal inference estimators. While limited to a particular family of data generating processes, our results may offer some more general insight into the behavior of proximal inference estimators.

Keywords: proximal causal inference; sensitivity analysis; bias analysis; negative control

MSC 2010: 62D20

1 Introduction

Causal inference using observational data often rests on the assumption of no unmeasured confounding. This assumption is not empirically verifiable, but sensitivity analysis methods [1–4] are available to assess robustness to possible violations. Alternatively, investigators might use methods such as instrumental variable analysis or difference-in-differences, which depend on different assumptions. Sensitivity analyses for violations of the assumptions required by these alternative methods are also available [5,6].

There has been recent interest in the use of negative control methods to detect and resolve confounder bias. A negative control outcome (NCO) is a variable known not to be causally affected by the treatment of interest, while a negative control exposure (NCE) is a variable known not to causally affect the outcome of interest [7]. Tchetgen Tchetgen et al. have developed a proximal inference framework [8–10], which uses NCE-NCO pairs sharing the same unmeasured confounders as the treatment–outcome relationship of interest as proxies to adjust for unmeasured confounding. Key assumptions of proximal inference include that the unmeasured confounders are associated with both the NCEs and NCOs (“ U -relevance”) and, roughly, that the NCEs and NCOs are sufficiently “rich” relative to the unmeasured confounders to serve as adequate proxies (“completeness”). Both U -relevance and completeness are themselves empirically untestable [8], and bias resulting from violations is not fully understood.

In this article, we characterize bias from violations of proximal inference assumptions in a linear structural equation model (LSEM). Our results build understanding of the sensitivity of proximal inference to assumption violations and enable “assumption-heavy” sensitivity analysis and quantitative bias analysis [11] tools for proximal inference. We hope that our results will also serve as a first step toward assumption-light sensitivity analysis.

As a motivating example, we consider the observational SUPPORT study estimating the effect of right heart catheterization (RHC) on 30 day survival in intensive care unit (ICU) patients. Initial analyses of these data [8] depended on the no unobserved confounding assumption and adjusted for a set of 71 baseline covariates. Despite extensive covariate adjustment, concern about unobserved confounding remains. Cui et al. [9] applied proximal inference to the SUPPORT data, using physiological measurements taken early in patients’ ICU stay as both NCEs and NCOs. These variables were noisy early measurements of indicators of evolving underlying health conditions and did not themselves influence treatment decisions or health outcomes, making them valid NCEs. They also preceded treatment, making them valid NCOs. However, violations of U -relevance and completeness are still possible. For example, suppose that physician training is an unobserved confounder, with physicians who favor the heart catheterization procedure also tending to favor other posttreatment interventions which impact survival. As physician training would be independent of patient characteristics such as the NCEs and NCOs, U -relevance would be violated. Completeness could also be violated if there were many unobserved confounders, e.g., many dimensions of underlying health status that influence physicians’ treatment decisions via aspects of the patient’s physical appearance or behavior not captured by the covariates. The bias formulas we derive enable sensitivity analysis under a range of assumptions about the magnitude of completeness and U -relevance violations and under the strong simplifying assumption that the data generating process was an LSEM.

The organization of the article is as follows. In Section 2, we review proximal inference and motivate the need for bias analysis given the assumptions of this framework. In Sections 3 and 4, we derive and numerically explore bias formulas in a setting with two-dimensional unobserved confounder U and a setting of general-dimensional unobserved confounder U with no treatment–confounder interaction, respectively. In Section 5, we present an illustrative sensitivity analysis (based on the bias formula from Section 4) of a proximal inference analysis estimating the effect of RHC on survival. In Section 6, we conclude this study.

2 Proximal identification of the average treatment effect

2.1 Review of definitions and assumptions

We use the potential outcome framework [12] to define causal effects. Let A denote the binary treatment of interest, Y the observed posttreatment outcome, and Y ( a ) , a = 0 , 1 the potential (counterfactual) outcome that would have been observed had treatment A been set to a . We implicitly make the no-interference assumption that the potential outcome of each individual does not depend on the treatments received by other individuals [13]. We aim to estimate the average causal effect (ACE) of A on Y , defined as ψ = E [ Y ( 1 ) − Y ( 0 ) ] .

Let L denote the set of measured covariates. We make the standard consistency and positivity assumptions, defined below.

Assumption 1

(Consistency) Y = Y ( A ) almost surely.

In other words, the observed value of Y under treatment A coincides with the counterfactual outcome that would have been observed under the same treatment value. Thus, we only observe the counterfactual outcome corresponding to the treatment value that was actually administered in our data.

Assumption 2

(Positivity) 0 < P ( A = a ∣ L ) < 1 almost surely, for a = 0 , 1 .

Assumption 2 states that both exposure levels are observed at all levels of the observed covariates L .

Many analyses further make the assumption that there is no unobserved confounding, i.e., that observed covariates block all “backdoor” causal paths between treatment and outcome.

Assumption 3

(Exchangeability) Y ( a ) ⊥ ⊥ A ∣ L , for a = 0 , 1 .

Under Assumptions 1–3, counterfactual mean E [ Y ( a ) ] is identified by the g-formula (introduced by Robins [14]):

(1) E [ Y ( a ) ] = ∑ l E [ Y ∣ A = a , L = l ] P ( L = l ) ,

where L is assumed to be discrete. When L is continuous, the sum ∑ l can be interpreted as integral ∫ l E [ Y ∣ A = a , L = l ] d P ( L = l ) .

Exchangeability is a strong assumption that is empirically untestable, and much effort in causal inference research has been devoted to relaxing this assumption. Miao et al. [15] propose an alternative to Assumption 3 that allows identification of the counterfactual mean E [ Y ( a ) ] despite unobserved confounding. We review the alternative conditions developed by Miao et al. [15], leading to the proximal g-formula, a counterpart to (1) allowing for some unobserved confounding.

As shown by Cui et al. [9], we consider a (potentially multidimensional) variable L that can be partitioned into three types of variables ( X , Z , W ) , such that

X includes observed variables that may be common causes of A and Y (observed confounders),
Z includes treatment-inducing confounding proxies, i.e., Z includes causes of A that share an unmeasured common cause U Z with Y ,
W includes outcome-inducing confounding proxies, i.e., W includes causes of Y that share an unmeasured common cause U W with A .

Figure 1 contains directed acyclic graphs (DAGs) representing each of the proxy types included in L . The covariates U Z represent common causes of Y and treatment-inducing proxies Z , while covariates U W represent common causes of A and outcome-inducing proxies W . In general, we will utilize U to denote unobserved common causes of A and Y .

$Figure 1 DAGs representing the three types of variables ( X , Z , W ) \left(X,Z,W) partitioning L L .$

Figure 1

DAGs representing the three types of variables ( X , Z , W ) partitioning L .

In the study by Miao et al. [15], exchangeability is replaced with the following assumptions:

Assumption 4

(Treatment-inducing confounding proxy)

(2) Y ( a , z ) = Y ( a ) , for all a , z almost surely.

Assumption 5

(Outcome-inducing confounding proxy)

(3) W ( a , z ) = W , for all a , z almost surely.

Assumption 6

(Latent unconfoundedness) If U denotes the set of unobserved confounders, then:

(4) Z ⊥ ⊥ ( Y ( a ) , W ) ∣ ( U , X ) ,

(5) W ⊥ ⊥ A ∣ ( U , X ) .

Assumption 4 states that Z does not have a direct effect on Y upon intervening on A , while Assumption 5 states that neither A nor Z have a causal effect on W . Past works [7] refer to variables Z satisfying (2) and (4) as NCE variables, and to variables W satisfying (3) and (5) as NCO variables. This terminology is based on negative control methods employing variables that share a confounding mechanism with the treatment–outcome relationship in view to detect bias in epidemiological research. In this article, we will use treatment-inducing (outcome-inducing) confounding proxies and NCE (NCO) variables interchangeably.

Moreover, to be valid proxies, variables ( Z , W ) must be U -relevant:

Assumption 7

( U -relevance)

(6) Z ⊥̸ ⊥ U ∣ ( A , X ) ,

(7) W ⊥̸ ⊥ U ∣ X .

The U -relevance assumption (also known as U -comparability [7]) requires the unmeasured confounders U of the A – Y relationship to be the same as the unmeasured confounders of the A – W and Z – Y secondary treatment–outcome associations. This is such that, by the negative control framework, any nonnull A – W or Z – Y association can be attributed to U confounding the A – Y relationship (while null associations imply no empirical evidence of unmeasured confounding).

Finally, in addition to Assumptions 1–7, Miao et al. [10] introduce the following completeness conditions for the identification of E [ Y ( a ) ] :

Assumption 8

(Completeness) For any a , x and for any square-integrable function g :

If E [ g ( U ) ∣ Z , A = a , X = x ] = 0 almost surely, then g ( U ) = 0 almost surely.
If E [ g ( Z ) ∣ W , A = a , X = x ] = 0 almost surely, then g ( Z ) = 0 almost surely.

Assumption 8(a) can be interpreted as a requirement that the NCE Z has enough variability relative to the variability of U ; similarly, Assumption 8(b) requires the variability of W to be large enough relative to the variability of Z . Under conditions 8(a) and (b), we can essentially account for U in our ACE estimate without either measuring or modeling the distribution of U . The role of completeness will be further explored in Section 2.2, where we outline the analytical framework by which the ACE is estimated using the proximal g-formula.

Completeness Assumption 8(a) has a simple interpretation in the case where confounders U and the negative control pair ( Z , W ) are all categorical. As mentioned by Cui et al. [9], if ( U , Z , W ) are categorical with respective number of categories ( d u , d z , d w ) , then completeness 8(a) requires that:

(8) min ( d z , d w ) ≥ d u .

In other words, proximal inference can account for unmeasured confounding if the number of categories of U is less than that of either Z or W . This leads to the practical recommendation to measure a rich set of baseline characteristics (which can be used as negative controls), such that the proximal identification approach has a higher chance of mitigating unmeasured confounder bias [9]. There is not such a straightforward method for expressing the completeness condition in the case of continuous U and negative controls ( Z , W ) , though some theory about completeness has been developed in some commonly used models (e.g., exponential families [16]). In Section 3, we investigate the behavior of proximal inference in LSEM setups in which the completeness Assumption 8(a) is violated.

2.2 Estimating the proximal g-formula via moment restriction

Miao et al. [15] introduce the notion of an outcome confounding bridge function, which transforms the NCO W to match the confounding effect of U on Y . More precisely, an outcome confounding bridge function h ( W , A , X ) is a function satisfying:

(9) E [ Y ∣ U , A = a , X = x ] = E [ h ( W , A , X ) ∣ U , A = a , X = x ] ,

for all values of a , x . In other words, if function h ( W , A , X ) exists, then the confounding effect of U on the transformed variable h ( W , a , X ) equals the confounding effect of U on Y at exposure level A = a . Given Assumptions 1, 5, 6, and 7, [15] infer that:

(10) E [ Y ( a ) ] = E [ h ( W , a , X ) ] for all a = 0 , 1 ,

which means E [ Y ( a ) ] can be estimated following the identification of an outcome bridge function h ( W , A , X ) , if such a function is assumed to exist.

Cui et al. [9] and Miao et al. [10] established the following proximal identification result for the outcome confounding bridge function that leverages the distribution of a NCE Z .

Theorem 1

Suppose there exists an outcome confounding bridge function h ( w , a , x ) solving the Fredholm integral equation of the first kind,

(11) E [ Y ∣ Z , A , X ] = ∫ h ( w , A , X ) d F ( w ∣ Z , A , X ) ,

almost surely. Then, under Assumptions 1, 2, 4–6, and 8(a),

(12) E [ Y ∣ U , A , X ] = ∫ h ( w , A , X ) d F ( w ∣ U , X ) ,

almost surely.

Under Assumption 6, we have E [ Y ( a ) ] = E [ E [ Y ∣ U , A = a , X ] ] for all a . The counterfactual mean E [ Y ( a ) ] can then be computed as follows:

Corollary 1.1

(Proximal g-formula) If (12) holds almost surely, then the counterfactual mean E [ Y ( a ) ] , a = 0 , 1 is nonparametrically identified by

(13) E [ Y ( a ) ] = ∫ X ∫ h ( w , a , x ) d F ( w ∣ x ) d F ( x ) ,

and the ACE is identified by

(14) ψ = ∫ X ∫ { h ( w , 1 , x ) − h ( w , 0 , x ) } d F ( w ∣ x ) d F ( x ) .

Assuming the outcome confounding bridge function h ( W , A , X ) exists and is identifiable as a solution to (12), [8,15] provide a practical approach for estimating the proximal g-formula using the generalized method of moments (GMM). Suppose one has access to n i.i.d. samples D i = ( A i , Y i , L i ) , L i = ( X i , Z i , W i ) (where Z , W are assumed to be correctly classified as treatment- and outcome-inducing confounding proxies, respectively). Moreover, suppose one has specified a parametric model for the confounding bridge, h ( W , A , X ) = h ( W , A , X ; b ) (e.g., h ( W , A , X ; b ) is linear in W , A , X with unknown parameter b ). The true model for h ( W , A , X ) is unknown, but one can specify a fairly flexible model.

We define the target parameter θ = ( b , ψ ) to encode the parameters b of h ( W , A , X ; b ) and the ACE ψ , along with moment restrictions

(15) h ( D i ; θ ) = { Y i − h ( W i , A i , X i ; b ) } × Q ( Z i , A i , X i ) ψ − { h ( W i , 1 , X i ; b ) − h ( W i , 0 , X i ; b ) } ,

for some vector function Q (as in the study by Miao et al. [15]). For instance, for linear bridge function,

h ( W i , A i , X i ; b ) = ( 1 A i W i X i A i X i A i W i ) T b ,

we may choose a function

Q ( Z i , A i , X i ) = ( 1 A i Z i X i A i X i A i Z i ) T ,

such that the dimension of Q is at least equal to that of h .

Then, for m n ( θ ) = 1 n ∑ i = 1 n h ( D i ; θ ) , the GMM estimator solves

(16) θ ˆ = arg min θ m n T ( θ ) m n ( θ ) .

As established by Miao et al. [15], the estimates ( b ˆ , ψ ˆ ) obtained from (16) are consistent.

2.3 The need for bias analysis

We have so far collected a series of untestable Assumptions 4–8 that replace exchangeability and account for the effect of unmeasured confounders U without directly modeling or estimating U . The impact on the direction and/or magnitude of bias resulting from violations of these assumptions has not been explored. We trust the analyst to identify “true” NCEs and NCOs in this work (Assumptions 4 and 5), on the basis of subject–matter knowledge. That is, we assume that arrow A → Z is correctly identified, and that there are no additional arrows Z → Y or A → W . Latent unconfoundedness (Assumption 6) presumably holds for some sufficiently rich U , but the richer (or higher-dimensional) the U required to satisfy Assumption 6, the less plausible it is that U -relevance (Assumption 7) or completeness (Assumption 8) hold. If many components of U are common causes of the NCEs and NCOs, then Assumption 8 is difficult to satisfy. In addition, if many components of U are required to block all backdoor paths between A and Y , then they are less likely to all be associated with both Z and W , violating Assumption 7.

3 Bias formulas for two-dimensional U

In this section, we characterize the proximal inference estimator bias in an LSEM under scenarios in which each of Z and W are one-dimensional, but U (comprising common causes of any of A , Y , Z , and W ) has two independent components. We first consider the case where one component of U is a common cause of A and Y but is not associated with either Z or W (which violates U -relevance Assumption 7 and is illustrated in Figure 2). Then, we consider the case where one component of U is an “extra” common cause of Z and W not associated with A or Y (which violates completeness Assumption 8 and is illustrated in Figure 3). We would argue that it is difficult to guard against violations of Assumptions 7 and 8 arising in this way using subject-matter knowledge, making sensitivity analysis for violations of these types particularly necessary.

Figure 2

DAG encoding causal relationships among variables in (19) in which U-relevance Assumption 7 is violated.

Figure 3

DAG encoding the causal relationships among variables in (21) in which completeness 7(a) is violated.

In addition, for the settings of Figures 2 and 3, we compare the bias of the proximal estimator due to violations of Assumptions 7 and 8 to the bias of alternative estimators of the ACE which the analyst might implement under an incorrect unconfoundedness assumption. We consider

an unadjusted estimator (referred to as “unadj”), which assumes no unobserved confounding and estimates E [ Y ( a ) ] as E ˆ [ Y ∣ A = a ] via sample means, and
an outcome regression estimator (referred to as “OR”), which adjusts for ( Z , W ) via the g-formula (1) taking L = { Z , W } and specifying outcome regression model E [ Y ∣ A , L ] = β T ( A , Z , W , A Z , A W ) .

3.1 Bias settings for two-dimensional U

As outlined at the beginning of this section, we derive formulas for the proximal inference estimator bias under scenarios depicted in Figures 2 and 3, under an LSEM based on [10]. The notation U = ( U 1 , U 2 ) indicates that U is two-dimensional with components U 1 and U 2 . Specifically, we consider i.i.d. data:

(17) U X ∼ N 0 0 0 , 1 ν ρ 1 ν 1 ρ 2 ρ 1 ρ 2 1 , ρ 1 , ρ 2 ∈ ( − 1 , 1 ) , logit ( P ( A = 1 ∣ X , U ) ) = α 0 + α x X + α u T U , Z = θ 0 + θ a A + θ x X + θ u T U + ε 1 , W = μ 0 + μ x X + μ u T U + ε 2 , Y ( a ) = γ 0 + γ a a + γ x X + γ u T U + γ a u 1 a U 1 + ε 3 , ε 1 , ε 2 , ε 3 ∼ N ( 0 , 1 ) .

Figure 4 depicts the causal DAG corresponding to this LSEM. The dashed bidirectional arrow between X and U indicates an unrestricted association arising from an unspecified causal relationship (e.g., a shared common cause) between these variables. The parameter ν = Corr ( U 1 , U 2 ) encodes the correlation between two components of U , following standardization of the components. In addition, parameter α u = ( α u 1 α u 2 ) T encodes the magnitude of confounding, while θ u = ( θ u 1 θ u 2 ) T and μ u = ( μ u 1 μ u 2 ) T encode the association between confounder U and the NCE/NCO, respectively. We will explore the sensitivity of proximal inference bias to particular values of ( α u , θ u , μ u ) .

Figure 4

DAG encoding the causal relationships among variables in (17).

The NCE Z is a posttreatment variable in this LSEM. We note that DAGs other than Figure 4 might also be compatible with proximal inference assumptions [7] (e.g., having an arrow Z → A ).

If U were one-dimensional and satisfied all the proximal inference assumptions, then the bridge function solving the outcome bridge function equation would take a linear form:

(18) h ( W , A , X ; b ) = b 0 + b a A + b w W + b x X + b a x A X + b a w A W .

The proof of this claim is in Appendix A.1. Therefore, like an analyst unaware of the additional assumption-violating component of U , we consider bias of proximal estimators that specify a linear bridge function.

By Theorem 1, violating Assumption 8(a) leads to a potentially biased ACE estimate as the outcome confounding bridge function h ˆ ( W , A , X ) resulting from the GMM procedure no longer satisfies the bridge equation (12). The following theorem shows that the completeness Assumption 8(a) is indeed violated in DGP (17) when both components of U are associated with negative controls Z and W .

Theorem 2

If θ u is nonzero (i.e., Z is U-relevant for at least one component of U), then the LSEM (17) with Gaussian ( X , U ) violates completeness Assumption 8(a).

The proof of Theorem 2 is included in Appendix B.

3.2 Partial U -relevance for two-dimensional unobserved confounder U (as in Figure 2)

In this subsection, we consider the case where U -relevance Assumption 7 is violated because one component of the two-dimensional confounder U is not associated with the negative controls. We exclude X for simplicity, which results in the DAG from Figure 2. We consider i.i.d. data generated by:

(19) U ∼ N 0 0 , 1 ν ν 1 , ν ∈ ( − 1 , 1 ) , logit ( P ( A = 1 ∣ U ) ) = α 0 + α u T U , Z = θ 0 + θ a A + θ u 1 U 1 + ε 1 , W = μ 0 + μ u 1 U 1 + ε 2 , Y ( a ) = γ 0 + γ a a + γ u T U + γ a u 1 a U 1 + ε 3 , ε 1 , ε 2 , ε 3 ∼ N ( 0 , 1 ) ,

where α u , γ u have all nonzero entries.

From Theorem 2, we know that setup (19) violates Assumption 8(a). In addition, we do not have a derivation of the true outcome confounding bridge function, so a linear model might be misspecified. The following theorem provides a formula for this bias under a linear bridge function specification, which is still the specification an analyst unaware of U 2 would choose and therefore relevant to sensitivity analysis. Similar to the previous case, we further assume Corr ( U 1 , U 2 ) = 0 to improve the interpretability of the resulting bias formula.

Theorem 3

If ( Z , W ) ⊥ ⊥ U 2 ∣ ( A , U 1 ) and Corr ( U 1 , U 2 ) = 0 , then fitting a linear outcome bridge function h ( W , A , X ) = b 0 + b a A + b w W + b a w A W under LSEM (17) yields a proximal outcome estimator bias equal to:

(20) δ POR = ( 1 − E [ A ] − E [ A U 1 2 ] ) E [ A U 1 ] E [ A U 1 U 2 ] ( E [ A ] E [ A U 1 2 ] − E [ A U 1 ] 2 ) ( ( 1 − E [ A ] ) ( 1 − E [ A U 1 2 ] ) − E [ A U 1 ] 2 ) + ( E [ A U 1 2 ] ( 1 − E [ A U 1 2 ] ) − E [ A U 1 ] 2 ) E [ A U 2 ] ( E [ A ] E [ A U 1 2 ] − E [ A U 1 ] 2 ) ( ( 1 − E [ A ] ) ( 1 − E [ A U 1 2 ] ) − E [ A U 1 ] 2 ) γ u 2 .

The proof for Theorem 3 (as well as a more general formula for ν ∈ ( − 1 , 1 ) ) is in Appendix C.1. It clearly follows that the proximal outcome estimator bias is proportional to the strength of association γ u 2 between outcome Y and U 2 , as well as to the function E [ A U 2 ] encoding the strength of association between treatment A and U 2 .

3.3 Completeness violation: Association between negative controls through U = ( U 1 , U 2 ) (as in Figure 3)

As shown in Figure 3, for simplicity, we consider a scenario with no covariates X , i.e.,:

(21) U ∼ N 0 0 , 1 ν ν 1 , ν ∈ ( − 1 , 1 ) , logit ( P ( A = 1 ∣ U ) ) = α 0 + α u 1 U 1 , Z = θ 0 + θ a A + θ u T U + ε 1 , W = μ 0 + μ u T U + ε 2 , Y ( a ) = γ 0 + γ a a + γ u 1 U 1 + γ a u 1 a U 1 + 2 ε 3 , ε 1 , ε 2 , ε 3 ∼ N ( 0 , 1 ) ,

where θ u , μ u have all nonzero entries.

The aforementioned setup satisfies all assumptions except 8(a) (which is violated according to Theorem 2). Thus, solving for the parameters b of a linear outcome bridge function (which is the functional form an investigator unaware of U 2 would select) will lead to a biased estimate of the ACE, even if the linear bridge function is correctly specified. The following theorem (proved in Appendix C.2) provides a formula for this bias under a linear outcome bridge function in the case when ν = 0 :

Theorem 4

If ( A , Y ) ⊥ ⊥ U 2 ∣ U 1 and Cor ( U 1 , U 2 ) = 0 , then fitting a linear outcome bridge function h ( W , A , X ) = b 0 + b a A + b w W + b a w A W under LSEM (21) yields a proximal outcome estimator bias equal to

(22) δ POR = E [ A U 1 ] E [ A ] ( 1 − E [ A ] ) θ u 2 θ u 1 μ u 2 ⋅ ( 1 − E [ A ] ) S 2 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ a u 1 + E [ A ] S 1 μ u 1 + S 1 ⋅ θ u 2 θ u 1 μ u 2 + ( 1 − E [ A ] ) S 2 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ u 1 ,

where

S 1 = ( 1 − E [ A ] ) 2 ( 1 − E [ A ] ) ( 1 − E [ A U 1 2 ] ) − E [ A U 1 ] 2 , S 2 = E [ A ] 2 E [ A ] E [ A U 1 2 ] − E [ A U 1 ] 2 .

For γ a u = 0 , the bias simplifies to

(23) δ POR = E [ A U 1 ] E [ A ] ( 1 − E [ A ] ) θ u 2 θ u 1 μ u 2 ⋅ E [ A ] S 1 μ u 1 + S 1 θ u 2 θ u 1 μ u 2 + ( 1 − E [ A ] ) S 2 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ u 1 .

The more general formulas for arbitrary ν ∈ ( − 1 , 1 ) are included in Appendix C.2. For ease of interpretation, we restrict our attention to the case when ν = 0 in this section’s discussion.

One implication of Theorem 4 is that the proximal outcome regression bias δ POR can obtain arbitrarily large when one of the denominators μ u 1 + S 1 θ u 2 θ u 1 μ u 2 or μ u 1 + S 2 θ u 2 θ u 1 μ u 2 approaches zero. This will be illustrated in Figure 5 (the solid “PI” curve) of Numerical Experiments section 3.4.2, in which there exists values of θ u 2 and μ u 2 for which the proximal outcome bias becomes infinite.

Figure 5

Plots of the ACE estimate bias under DGP (21) and under violations of completeness Assumption 8(a), where θ u 1 μ u 1 and θ u 2 μ u 2 have the same sign as in Theorem 5(b). The completeness violation is imposed by setting both components of U to be associated with both negative controls and only including one NCE and one NCO, as per Theorem 2. Along the x -axis, we vary the strength of association between U 2 and Z and W , by varying θ u 2 and μ u 2 (set to be equal to each other, θ u 2 = μ u 2 , so they always have positive product). The different panels correspond to different values of α u 1 governing the strength of confounding by U 1 . The figure shows that, as predicted by Theorem 5(b), the bias of the proximal outcome estimator (“PI”, solid line) can be greater or smaller than the bias of the unadjusted estimator (“Unadj”, dashed line —) and can be arbitrarily large under certain parameter settings. Moreover, the relationship of PI with the outcome adjusted estimator (“OR”, dotted line ⋯ ) varies with the strength of confounding α u 1 across panels. All other parameters are fixed, with values: α 0 = γ 0 = θ 0 = μ 0 = 0 , θ a = θ u 1 = 1 , γ u 1 = 1 , γ u a 1 = 1.5 , μ u 1 = − 0.5 , γ a = 0.5 , similar to the simulation in [15]. (a) α u 1 = 0.3 , (b) α u 1 = 0.5 , (c) α u 1 = 1 , (d) α u 1 = − 0.5 .

Under the simplifying assumption that the unobserved confounder is not an effect modifier, i.e., γ a u = 0 , in Theorem 5, we characterize when the proximal estimator will reduce bias relative to an unadjusted estimator, even when the proximal inference assumptions are violated. It turns out that if the components of U induce associations between Z and W in the same direction, then the proximal estimator is guaranteed to have lower bias. This will be illustrated via the plots in Section 3.4.2, through numerical comparisons between the proximal outcome and unadjusted bias curves under different setups of U -component associations.

Theorem 5

Assuming γ a u 1 = 0 , the proximal g-computation bias δ POR and the unadjusted estimator bias δ unadj can be compared as follows:

If θ u 1 μ u 1 and θ u 2 μ u 2 have the same sign (both positive or both negative), then ∣ δ POR ∣ < ∣ δ unadj ∣ .
If θ u 1 μ u 1 and θ u 2 μ u 2 have different signs, then
∣ δ POR ∣ > ∣ δ unadj ∣ i f θ u 1 μ u 1 θ u 2 μ u 2 > − S 1 ( 1 − E [ A ] ) − S 2 E [ A ] , ∣ δ POR ∣ < ∣ δ unadj ∣ i f θ u 1 μ u 1 θ u 2 μ u 2 < − S 1 ( 1 − E [ A ] ) − S 2 E [ A ] .

The proof for Theorem 5 is in Appendix C.5.

3.4 Numerical experiments

We provide numerical examples based on the bias formulas derived earlier in this section to illustrate how the bias of different estimators (proximal and nonproximal) varies with different values of ( α u 2 , θ u 2 , μ u 2 , γ u 2 ) , which encode how strongly the proximal identification assumptions are violated in the presence of U 2 .

3.4.1 Partial U -relevance for two-dimensional unobserved confounder U

Figure 6 illustrates the change in absolute bias for the proximal and unadjusted estimators relative to the value of α u 2 , for the same and opposite directions of associations γ u 1 and γ u 2 , respectively. In both cases, the distributions of bias appear almost shifted by translation. For γ u 1 and γ u 2 of opposite directions of association, we observe a reversal in which estimator has less bias compared to the case of γ u 1 γ u 2 > 0 .

Figure 6

Plots of the ACE estimate bias under DGP (19) and under violations of U -relevance Assumption 7. We compare the proximal outcome estimator bias (“PI”, solid line) to the unadjusted estimator bias (“Unadj”, dashed line —) that assumes no unobserved confounding. The U -relevance violation is imposed by setting θ u 2 = μ u 2 = 0 , making U 2 unassociated with the negative controls. Strength of confounding by U 2 is varied along the x -axis through the parameter α u 2 governing its association with treatment. The relative magnitude of bias of the proximal and unadjusted estimators flips depending on whether U 1 induces a positive (panels (a) and (d)) or negative (panels (b) and (c)) association between treatment and outcome. All other parameters are fixed, with values: α 0 = γ 0 = θ 0 = μ 0 = 0 , θ a = θ u 1 = 1 , μ u 1 = 1 , γ a u 1 = 1 , γ u 2 = 1 , γ a = 0.5 . (a) α u 1 = 0.5 , γ u 1 = 1.5 , (b) α u 1 = − 0.5 , γ u 1 = 1.5 , (c) α u 1 = 0.5 , γ u 1 = − 1.5 , and (d) α u 1 = − 0.5 . γ u 2 = − 1.5 .

3.4.2 Completeness violation: Association between negative controls through U = ( U 1 , U 2 )

Figures 5 and 7 illustrate the change in absolute bias for each of the three estimators relative to the value of θ u 1 , where it is assumed that μ u 2 = θ u 2 in all cases, for the same sign and opposite signs of θ u 1 μ u 1 and θ u 2 μ u 2 , respectively. We observe that the absolute unadjusted bias is always greater than the proximal estimator bias when θ u 1 μ u 1 , θ u 2 μ u 2 have the same sign, as predicted by Theorem 5. Conversely, for different signs of θ u 1 μ u 1 , θ u 2 μ u 2 , the proximal estimation bias exceeds that of the unadjusted estimator beyond a certain threshold in the value of ∣ θ u 2 ∣ (and can even be infinite). Both setups are consistent with Theorem 5. Any ordering of the biases of the three estimators is possible depending on the parameter values, making the choice of estimator not straightforward.

Figure 7

Plots of the ACE estimate bias under DGP (21) and under violations of completeness Assumption 8(a), where θ u 1 μ u 1 and θ u 2 μ u 2 have the same sign as in Theorem 5(a). The completeness violation is imposed by setting both components of U to be associated with both negative controls and only including one NCE and one NCO, as per Theorem 2. Along the x -axis, we vary the strength of association between U 2 and Z and W , by varying θ u 2 and μ u 2 (set to be equal to each other, θ u 2 = μ u 2 , so they always have positive product). The different panels correspond to different values of α u 1 governing the strength of confounding by U 1 . The figure shows that, as predicted by Theorem 5(a), the bias of the proximal outcome estimator (“PI”, solid line) is always less than the unadjusted bias (“Unadj”, dashed line —). However, the relationship of PI with the outcome adjusted estimator (“OR”, dotted line ⋯ ) varies with the strength of confounding α u 1 across panels. All other parameters are fixed, with values: α 0 = γ 0 = θ 0 = μ 0 = 0 , θ a = θ u 1 = 1 , γ u 1 = 1 , γ u a 1 = 1.5 , μ u 1 = 0.5 , γ a = 0.5 , similar to the simulation in [15]. (a) α u 1 = 0.3 , (b) α u 1 = 0.5 , (c) α u 1 = 1 , (d) α u 1 = − 0.5 .

4 Bias formulas in arbitrary dimension with no confounder–treatment interaction

To tractably obtain bias formulas in the general case of multidimensional Z , W , U , X with ( dim ( Z ) , dim ( W ) , dim ( U ) , dim ( X ) ) = ( m , n , p , q ) , we again make the simplifying assumption that γ a u = 0 – that is, the unobserved confounder is not an effect modifier. Moreover, we assume that the analyst is aware of the lack of interaction between A and U in the true outcome model, so we consider a simplified bridge function model h ( W , A , X ) = b 0 + b a A + b w T W + b x T X . We further assume that the unobserved and observed confounders ( U , X ) jointly follow a multivariate normal distribution with mean 0 p + q , Var ( U ) = Σ u , Var ( X ) = Σ x and some appropriate positive semidefinite covariance matrix such that Cov ( U , X ) = ρ ∈ ( − 1 , 1 ) p × q .

We consider i.i.d. data generated by:

(24) U X ∼ N 0 p 0 q , Σ u ρ ρ T Σ x , ρ ∈ ( − 1 , 1 ) p × q , logit ( P ( A = 1 ∣ U , X ) ) = α 0 + α u T U + α x T X , Z = θ 0 + θ a A + θ u T U + θ x T X + ε 1 , W = μ 0 + μ u T U + μ x T X + ε 2 , Y ( a ) = γ 0 + γ a a + γ u T U + γ x X + ε 3 , ε 1 , ε 2 , ε 3 ∼ N ( 0 , 1 ) .

The following theorem provides a formula for the proximal outcome identification bias under a linear bridge function:

Theorem 6

Let E [ A U ] = ( E [ A U 1 ] , … , E [ A U p ] ) , E [ A X ] = ( E [ A X 1 ] , … , E [ A X p ] ) , and

B = Σ u − ρ Σ x − 1 ρ T − ( E [ A U ] − ρ Σ x − 1 E [ A X ] ) ( E [ A U ] T − E [ A X ] T Σ x − 1 ρ T ) E [ A ] ( 1 − E [ A ] ) − E [ A X ] T Σ x − 1 E [ A X ] θ u .

If ( B T μ u ) † denotes the Moore-Penrose inverse of B T μ u , then fitting a linear outcome bridge function h ( W , A , X ) = b 0 + b a A + b w T W + b x T X under LSEM (24) yields a proximal outcome estimator bias equal to

(25) δ POR = E [ A U ] T − E [ A X ] T Σ x − 1 ρ T E [ A ] ( 1 − E [ A ] ) − E [ A X ] T Σ x − 1 E [ A X ] ( I p − μ u ( B T μ u ) † B T ) γ u .

A proof of Theorem 6 (which also considers the case of general Var ( U ) = Σ u ∈ R p ) can be found in Appendix C.6.

Remark 1

If m = n = p and B T μ u has full rank, then δ POR = 0 . If p < m or p < n , then we have a similar discussion as in [17] where we can either consider the Moore-Penrose inverse of B T μ u , or reduce the dimensions of Z and W until they match the dimension of U .

Theorem 6 enables sensitivity analysis. Note that the terms E [ A ] and E [ A X ] in (25) can be estimated from data. Thus, to perform a sensitivity analysis using the bias formula (25), it remains for the analyst to specify parameters E [ A U ] (which is determined by α u ), μ u , γ u , and ρ . An analyst could specify a distribution over these parameters, which, via (25), would imply a distribution over δ as each realization of the parameters drawn from the distribution would correspond to a different bias δ . In the following section, we provide an example of this procedure applied to real data.

5 Illustration of sensitivity analysis on the SUPPORT data

In this section, we provide an illustrative sensitivity analysis of the proximal inference application in [8] using the Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatment (SUPPORT) dataset. We do not aim to be prescriptive, but rather to provide an example of how one might apply the bias formulas we derived to explore the extent of likely bias in a proximal inference analysis. Importantly, if data were not generated from an LSEM, then clearly our bias formulas will be incorrect. Still, the bias ensuing from violations of proximal inference assumptions under the LSEM assumption may serve as an approximation to the actual bias of a proximal inference analysis even if the data were not generated by an LSEM. We hope bias formulas for more general data generating processes might be developed in future work.

The SUPPORT data comprise 5,735 individuals, of which 2,184 were treated by RHC and 3,551 belonged to the control group. Outcome variable Y encodes the number of days between admission to the ICU and death or censoring at 30 days. The goal of the analysis was to estimate the ACE of RHC on this 30-day survival outcome. As in the study by Cui et al. [9], we consider 71 baseline covariates, including demographics and physiological measures, to construct the bins ( X , Z , W ) for confounding adjustment and confounding proxies. Cui et al. [9] reason that the 10 variables measuring patients’ physiological status during the initial 24 hours in the ICU, which provide a snapshot of underlying physiological state subject to measurement error, may be viewed as confounding proxies. They are valid NCOs because they precede treatment. They are valid NCEs because physicians did not base treatment decisions on their values, and as mere measurements, they could not directly impact health outcomes in any other way. (It is also important that they are noisy measurements, as the actual underlying values of what they seek to measure could influence both treatment decisions and outcomes.) Of these 10 measurements, four are allocated to the negative control bins Z = ( pafi 1 , paco 21 ) and W = ( ph 1 , hema 1 ) based on strength of association with the A and Y , respectively. The remaining 67 variables are collected under X . Like Cui et al. [9], we specify the outcome confounding bridge function h ( W , A , X ; b ) = b 0 + b a A + b x T X + b w T W to compute the proximal outcome regression estimate ψ ^ POR of the ACE.

We assess the potential impact of assumption violations by evaluating bias formula (25) under draws from an assumed distribution on the dimension of U and parameters α u , γ u , θ u , μ u , ρ from (24). We consider the following framework for drawing the sensitivity parameters:

Draw p = dim ( U ) (i.e., number of independent components of unobserved confounders U ) from a Poisson distribution with mean λ (which we set to λ = 5 ).
Set a subset of components of U to violate U -relevance, i.e., be unassociated with Z and W , by setting the corresponding columns in θ u and μ u to zero. The set of U -relevance violating components is selected randomly as follows:
1. Set proportion of violating U components π ∈ [ 0 , 1 ] ,
2. For each i = 1 , … , d , introduce relevance violation on component U i with probability π .
We considered two approaches to drawing ρ , which determines the covariance between unobserved confounders U and observed covariates X :
1. Empirical correlation: Construct covariance matrix ρ = Cov ( U , X ) such that covariances between elements of U and X are of similar magnitude to covariances between elements of X (as in Appendix D.2).
2. Uncorrelated: U and X are uncorrelated, i.e., ρ = Cov ( U , X ) = 0 p × q .
In this setup, we assume that Σ u = I p (i.e., the components of U are all uncorrelated) for illustrative purposes. In practice, one might consider a similar procedure for drawing Σ u , informed by the empirical covariances between the components of X and other subject-matter input about the nature of unobserved confounding.
Draw the parameters in θ u and μ u from uniform distributions over corresponding intervals θ u ∈ [ θ u , l , θ u , r ] , μ u ∈ [ μ u , l , μ u , r ] , where we set element-wise interval ends θ u , l , θ u , r , μ u , l , μ u , r . Details regarding how we selected these intervals are in Appendix D.2.
The remaining parameters γ u and E [ A U ] encoding the strength of association between U and ( Y , A ) are then constrained in terms of previously drawn sensitivity parameters and covariances that can be estimated from the data according to the following formulas (derived in Appendix D.1):
Constraining E [ A U ] (assuming fixed ρ ):
(26) E [ A U ] = ρ Σ x − 1 E [ A X ] + ( μ u T ) † ( Cov ( W , A ) − Cov ( W , X ) Σ x − 1 E [ A X ] ) .
Constraining γ u (assuming fixed E [ A U ] and ρ ):
(27) γ u = [ μ u T ( I p − ρ Σ x − 1 ρ T ) − ( Cov ( W , A ) − Cov ( W , X ) Σ x − 1 E [ A X ] ) ⋅ ⋅ ( E [ A U ] T − E [ A X ] T Σ x − 1 ρ T ) E [ A ] ( 1 − E [ A ] ) − E [ A X ] T Σ x − 1 E [ A X ] † ⋅ [ Cov ( W , Y ) − Cov ( W , X ) Σ x − 1 Cov ( X , Y ) − ( Cov ( W , A ) − Cov ( W , X ) Σ x − 1 E [ A X ] ) ⋅ ( Cov ( A , Y ) − E [ A X ] T Σ x − 1 Cov ( X , Y ) ) E [ A ] ( 1 − E [ A ] ) − E [ A X ] T Σ x − 1 E [ A X ] .
To account for sampling variability of the covariance matrices plugged into the above formulas, we employ a bootstrapping strategy (Appendix D.2).

We would expect that settings with lower expected dimension λ of U , with lower probabilities π of U -relevance violations, and with ρ drawn according to the empirical correlation regime (ensuring that observed covariates X are good proxies for U ) would lead to less bias. Table 1 contains sensitivity-adjusted confidence intervals (CIs) for the ACE under various distributions of the sensitivity parameters within the framework outlined earlier. The first row replicates results from Cui et al. [9], assuming no bias. Let δ j , k denote the k -quantile of the bias distribution under the setup for drawing sensitivity parameters in row j of Table 1. So δ 1 , k is 0 for all k , since row 1 assumes no bias. (We note that δ j , k usually takes extreme values for k near 0 or 1 when completeness is potentially violated, as can be gleaned from Figure 5.) We compute the sensitivity-adjusted CI in row j as [ δ j , . 05 + ψ ˆ POR − 1.96 × S E , δ j , . 95 + ψ ˆ POR + 1.96 × S E ] . Since dim ( Z ) = dim ( W ) = 2 and dim ( U ) is taken from a Poisson distribution with mean 5, rows 2 and 3 result from a mixture of nonviolating and completeness-violating structures, while rows 4–9 result from a mixture of unbiased, completeness-violating, and U -relevance violating structures.

Table 1

Sensitivity-adjusted confidence intervals of the average treatment effect, where the intervals are computed using [ δ j , . 05 + ψ ˆ POR ‒ 1.96 × S E , δ j , . 95 + ψ ˆ POR + 1.96 × S E ]

Row #	Setup	Sensitivity-adjusted CIs for ψ ^ POR
1	No bias-inducing U	( ‒ 2.65 , ‒ 0.94 )
2	No U -relevance violation ( π = 0 ) + empirical correlation	( ‒ 2.67 , ‒ 0.92 )
3	π = 0 + uncorrelated ( X , U )	( ‒ 3.36 , ‒ 0.37 )
4	π = 1 ∕ 3 + empirical correlation	( ‒ 2.67 , ‒ 0.92 )
5	π = 1 ∕ 3 + uncorrelated ( X , U )	( ‒ 2.91 , ‒ 0.31 )
6	π ∼ Unif ( [ 0.2 , 0.5 ] ) + empirical correlation	( ‒ 2.68 , ‒ 0.91 )
7	π ∼ Unif ( [ 0.2 , 0.5 ] ) + uncorrelated ( X , U )	( ‒ 2.94 , 0.36 )
8	π ∼ Unif ( [ 0.2 , 0.8 ] ) + empirical correlation	( ‒ 2.68 , ‒ 0.91 )
9	π ∼ Unif ( [ 0.2 , 0.8 ] ) + uncorrelated ( X , U )	( ‒ 2.79 , 0.55 )

In the presence of a rich adjustment set with significant correlation between X and U (i.e., the even-numbered rows of Table 1 using the empirical correlation setup for ρ ), the impact of proximal inference assumption violations on ψ ^ POR is quite small, presumably because X acts as a good proxy for U . However, in the uncorrelated ( X , U ) case, the sensitivity-adjusted CIs are significantly wider. If only completeness (not U -relevance) is violated, the sensitivity-adjusted intervals still exclude 0. Only in rows 7 and 9 (which allow a high proportion of U components to be independent of X and the negative controls) does the sensitivity-adjusted CI indicate that the data are compatible with a point estimate having the wrong sign.

Due to the interconnectedness of biological systems, we believe that most unmeasured confounders related to patients’ pretreatment health status would be associated with both the covariates X and the NCEs and NCOs (which also reflect pretreatment health status). In Section 1, we posited that physician preference might be an unobserved confounder that violates U -relevance as it is unrelated to patient state. Physicians who prefer to perform RHCs may tend to have other preferences for posttreatment interventions that also impact the outcome. Perhaps time of admission could be another U -relevance violating confounder, if practice but not patient state varies with time of admission. However, it is difficult to conceive of large numbers of confounders independent of patient state, and the ones we identified are likely weak. Thus, we find settings with empirical correlation more plausible and interpret the sensitivity analysis to suggest that the results are probably robust to proximal inference assumption violations.

6 Discussion

By deriving bias formulas for proximal inference estimators under violations of completeness and U -relevance, we begin to gain insight into the sensitivity of proximal inference estimators to these bias sources. For example, under some LSEM settings, it is possible for completeness violations alone (i.e., too many common causes of the NCE and NCO) to lead to arbitrarily more bias in the proximal inference estimator than in an unadjusted estimator completely subject to unobserved confounding (Figure 5). However, under the conditions of Theorem 5, if the different components of the unobserved confounder induce associations between the NCE and NCO in the same direction, then the proximal inference estimator is guaranteed to perform better than an unadjusted one. Neither of these scenarios (infinite bias or guaranteed improvement over unadjusted) imposes any constraints on the observed data, highlighting the utility of bias analysis.

We have also shown how our bias formulas enable assumption-heavy sensitivity analysis of proximal inference estimates. While (25) was derived under the strong assumptions that data were generated by an LSEM and U is not an effect modifier, an analyst might reasonably conduct a sensitivity analysis using (25) as we described even if they did not believe the assumptions held for the data and did not construct their proximal inference estimators according to an LSEM. There is a long history of simplifying assumptions in sensitivity analysis. For example, VanderWeele and Arah [13] and Rosenbaum [18] assume a one-dimensional binary confounder for tractable sensitivity analysis of no unobserved confounding. Later, Ding and VanderWeele [2] developed an approach that made far fewer restrictions, allowing multidimensional and nonbinary unobserved confounders that may interact arbitrarily with the treatment. We are in the early stages of proximal inference, so we currently need to settle for preliminary insights into the behavior of proximal inference estimators under strong simplifying assumptions. However, because proximal inference is a promising approach to causal inference that has rightfully garnered much attention from methodological researchers, it is important to begin probing its operating characteristics under violations of its assumptions.

Funding information: Funding was provided by the MIT-IBM Watson AI Lab.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Conflict of interest: The authors state no conflict of interest.
Ethical approval: The research related to human use has been complied with all the relevant national regulations, institutional policies and in accordance the tenets of the Helsinki Declaration, and has been approved by the authors institutional review board or equivalent committee.
Informed consent: Informed consent was obtained from all individuals included in this study.
Data availability statement: These are secondary analyses that are de-identified and are publicly available. The datasets analysed during the current study are available from the corresponding author on reasonable request.

Appendix A Bridge function parameters for post-treatment NCE

A.1 Bridge functions derivation for one-dimensional unobserved U – case of no violations

We identify coefficients ( b 0 , b a , b x , b w , b a x , b a w ) and ( t 0 , t a , t x , t z ) such that

(A1) E [ Y ∣ U , a , X ] = ∫ h ( w , a , X ) d F ( w ∣ U , X ) , a = 0 , 1 ,

(A2) 1 P [ A = a ∣ U , X ] = ∫ q ( z , a , X ) d F ( z ∣ U , a , X ) , a = 0 , 1 .

Coefficients of h :

We have that E [ Y ∣ U , A , X ] = γ 0 + γ a A + γ x X + γ u U + γ a u A U , so (A1) implies

γ 0 + γ a A + γ x X + γ u U + γ a u A U = b 0 + b a A + b x X + b a x A X + ∫ ( b w + b a w A ) w ⋅ d F ( w ∣ U , X ) ⇔ γ 0 + γ a A + γ x X + γ u U + γ a u A U = b 0 + b a A + b x X + b a x A X + ( b w + b a w A ) E [ W ∣ U , X ] .

Since W ∣ U , X ∼ N ( μ 0 + μ x X + μ u U , 1 ) , we obtain

γ 0 + γ a A + γ x X + γ u U + γ a u A U = b 0 + b a A + b x X + b a x A X + ( b w + b a w A ) ( μ 0 + μ x X + μ u U ) .

Assigning values A = 0 , 1 , we obtain the following system

(A3) 0 = γ 0 − b 0 − b w μ 0 + ( γ x − b x − μ x b w ) X + ( γ u − b w μ u ) U ,

(A4) 0 = ( γ 0 + γ a ) − ( b 0 + b a ) − ( b w + b a w ) μ 0 + ( γ x − b x − b a x − μ x ( b w + b a w ) ) X + ( γ u + γ a u − ( b w + b a w ) μ u ) U .

Multiplying (A3) by U and X and taking the expectation in each resulting equations yields

0 = ρ ( γ x − b x − μ x b w ) + ( γ u − b w μ u ) , 0 = ( γ x − b x − μ x b w ) + ρ ( γ u − b w μ u ) .

Since ρ ∈ ( − 1 , 1 ) , we obtain γ x − b x − μ x b w = γ u − b w μ u = 0 . From (A3), this additionally implies γ 0 − b 0 − b w μ 0 = 0 .

Similarly, from (A4), we obtain γ a − b a − μ 0 b a w = − b a x − μ x b a w = γ a u − b a w μ u = 0 . Solving for the coefficients of h , we obtain the unique solution:

( b 0 , b a , b x , b w , b a x , b a w ) = γ 0 − μ 0 γ u μ u , γ a − μ 0 γ a u μ u , γ x − μ x γ u μ u , γ u μ u , − μ x γ a u μ u , γ a u μ u .

Coefficients of q :

We have that P [ A ∣ X , U ] = 1 1 + exp { ( − 1 ) A ( α 0 + α x X + α u U ) } , such that (A2) implies

1 + exp { ( − 1 ) A ( α 0 + α x X + α u U ) } = 1 + exp { ( − 1 ) 1 − A ( t 0 + t a A + t x X ) } ∫ exp { ( − 1 ) 1 − A t z Z } d F ( z ∣ U , A , X ) .

Since Z ∣ U , A , X ∼ N ( θ 0 + θ a A + θ u U + θ x X , 1 ) , we obtain

1 + exp { ( − 1 ) A ( α 0 + α x X + α u U ) } = 1 + exp { ( − 1 ) 1 − A ( t 0 + t a A + t x X ) } ∫ exp { ( − 1 ) 1 − A t z Z } d F ( z ∣ U , A , X ) = 1 + exp { ( − 1 ) 1 − A ( t 0 + t a A + t x X ) } ∫ 1 2 π exp { ( − 1 ) 1 − A t z Z + 0.5 ( Z − θ 0 − θ a A − θ u U − θ x X ) 2 } = 1 + exp ( − 1 ) 1 − A ( t 0 + t a A + t x X ) + ( − 1 ) 1 − A t z ( θ 0 + θ a A + θ u U + θ x X ) + t z 2 2 ,

for each A = 0 , 1 . This is equivalent to

( − 1 ) A ( α 0 + α x X + α u U ) = ( − 1 ) 1 − A ( t 0 + t a A + t x X ) + ( − 1 ) 1 − A t z ( θ 0 + θ a A + θ u U + θ x X ) + 0.5 t z 2 .

Assigning values A = 0 , 1 , we obtain the system

(A5) 0 = α 0 + t 0 + θ 0 t z − 0.5 t z 2 + ( α x + t x + θ x t z ) X + ( α u + θ u t z ) U ,

(A6) 0 = α 0 + ( t 0 + t a ) + ( θ 0 + θ a ) t z + 0.5 t z 2 + ( α x + t x + θ x t z ) X + ( α u + θ u t z ) U .

As in the outcome bridge function case, it follows that the coefficients of 1 (the constant term), X , and U must be identically 0. We then obtain α 0 + t 0 + θ 0 t z − 0.5 t z 2 = t a + θ a t z + t z 2 = α x + t x + θ x t z = α u + θ u t z = 0 , which yields the unique solution

( t 0 , t a , t x , t z ) = − α 0 + θ 0 θ u α u + 0.5 θ u 2 α u 2 , − 1 θ u 2 α u 2 + θ a θ u α u , θ x θ u α u − α x , − α u θ u .

B Proving violations of completeness Assumption 8(a)

We will prove that completeness Assumption 8(a) is violated under the DGP (17) with θ u = ( θ u 1 θ u 2 ) T , θ u 1 ≠ 0 . We note that case θ u 2 ≠ 0 can be treated symmetrically, by appropriately exchanging u 1 and u 2 in the following computations.

For any values u , z , a , x , we have that

P [ U = u ∣ Z = z , A = a , X = x ] = P [ U = u , Z = z ∣ A = a , X = x ] P [ Z = z ∣ A = a , X = x ] = P [ Z = z ∣ U = u , A = a , X = x ] P [ U = u ∣ A = a , X = x ] P [ Z = z ∣ A = a , X = x ] = P [ ε 1 = z − θ 0 − θ a a − θ x x − θ u T u ] P [ A = a ∣ U = u , X = x ] P [ U = u ∣ X = x ] P [ A = a ∣ X = x ] P [ Z = z ∣ A = a , X = x ] .

Using

P [ U = u ∣ X = x ] = exp ( ρ 2 u 1 − ρ 1 u 2 ) 2 − ( u 2 − ρ 2 x ) 2 − ( u 1 − ρ 1 x ) 2 2 ( 1 − ν 2 − ρ 1 2 − ρ 2 2 + 2 ν ρ 1 ρ 2 ) 2 π 1 − ρ 1 2 − ρ 2 2 ,

we obtain

P [ U = u ∣ Z = z , A = a , X = x ] = 1 2 π exp { − 0.5 ( Z − θ 0 − θ a a − θ x x − θ u T u ) 2 } P [ Z = z , A = a ∣ X = x ] ( 1 + exp { ( − 1 ) a ( α 0 + α x x + α u T u ) } ) ⋅ 1 2 π 1 − ρ 1 2 − ρ 2 2 exp ( ρ 2 u 1 − ρ 1 u 2 ) 2 − ( u 2 − ρ 2 x ) 2 − ( u 1 − ρ 1 x ) 2 2 ( 1 − ν 2 − ρ 1 2 − ρ 2 2 + 2 ν ρ 1 ρ 2 ) = exp 1 2 ( ρ 2 u 1 − ρ 1 u 2 ) 2 − ( u 2 − ρ 2 x ) 2 − ( u 1 − ρ 1 x ) 2 ( 1 − ν 2 − ρ 1 2 − ρ 2 2 + 2 ν ρ 1 ρ 2 ) − 1 2 ( Z − θ 0 − θ a a − θ x x − θ u T u ) 2 ( 2 π ) 3 ∕ 2 1 − ρ 1 2 − ρ 2 2 P [ Z = z , A = a ∣ X = x ] ( 1 + exp { ( − 1 ) a ( α 0 + α x x + α u T u ) } ) .

Let us consider

(A7) g ( U ) = u 2 u 2 2 − 3 − α u 2 2 − α u 1 2 θ u 2 2 θ u 1 2 + 2 α u 1 α u 2 θ u 2 θ u 1 exp − u 2 2 2 ⋅ exp − ( ρ 2 u 1 − ρ 1 u 2 ) 2 − ( u 2 − ρ 2 x ) 2 − ( u 1 − ρ 1 x ) 2 2 ( 1 − ν 2 − ρ 1 2 − ρ 2 2 + 2 ν ρ 1 ρ 2 ) ⋅ ⋅ ( 2 + exp ( − α 0 − α x x − α u T u ) + exp ( α 0 + α x x + α u T u ) ) .

We will prove that E [ g ( U ) ∣ Z = z , A = a , X = x ] = 0 for any values z , a , x . We have

E [ g ( U ) ∣ Z = z , A = a , X = x ] = ∫ ( − ∞ , ∞ ) 2 g ( u ) P [ U = u ∣ Z = z , A = a , X = x ] d u 1 d u 2 = 1 ( 2 π ) 3 ∕ 2 1 − ρ 1 2 − ρ 2 2 P [ Z = z , A = a ∣ X = x ] ∫ ( − ∞ , ∞ ) 2 u 2 u 2 2 − 3 − α u 2 2 − α u 1 2 θ u 2 2 θ u 1 2 + 2 α u 1 α u 2 θ u 2 θ u 1 ⋅ exp − 1 2 ( Z − θ 0 − θ a a − θ x x − θ u T u ) 2 − u 2 2 2 ( 1 + exp { ( − 1 ) 1 − a ( α 0 + α x x + α u T u ) } ) d u 1 d u 2 .

Let

T 1 = ∫ ( − ∞ , ∞ ) 2 u 2 exp − 1 2 ( Z − θ 0 − θ a a − θ x x − θ u T u ) 2 − 1 2 u 2 2 ⋅ ⋅ ( 1 + exp { ( − 1 ) 1 − a ( α 0 + α x x + α u T u ) } ) d u 1 d u 2 , T 2 = ∫ ( − ∞ , ∞ ) 2 u 2 3 exp − 1 2 ( Z − θ 0 − θ a a − θ x x − θ u T u ) 2 − 1 2 u 2 2 ⋅ ⋅ ( 1 + exp { ( − 1 ) 1 − a ( α 0 + α x x + α u T u ) } ) d u 1 d u 2 .

We have that

∫ − ∞ ∞ exp − 1 2 ( Z − θ 0 − θ a a − θ x x − θ u T u ) 2 ( 1 + exp { ( − 1 ) 1 − a ( α 0 + α x x + α u T u ) } ) d u 1 = exp − 1 2 ( Z − θ 0 − θ a a − θ x x − θ u 2 u 2 ) 2 ∫ − ∞ ∞ exp θ u 1 ( Z − θ 0 − θ a a − θ x x − θ u 2 u 2 ) u 1 − 1 2 θ u 1 2 u 1 2 d u 1 + exp { ( − 1 ) 1 − a ( α 0 + α x x + α u 2 u 2 ) } ⋅ ⋅ ∫ − ∞ ∞ exp θ u 1 Z − θ 0 − θ a a − θ x x − θ u 2 u 2 − ( − 1 ) a α u 1 θ u 1 u 1 − 1 2 θ u 1 2 u 1 2 d u 1 = exp − 1 2 ( Z − θ 0 − θ a a − θ x x − θ u 2 u 2 ) 2 2 π ∣ θ u 1 ∣ exp θ u 1 2 ( Z − θ 0 − θ a a − θ x x − θ u 2 u 2 ) 2 2 θ u 1 2 + exp { ( − 1 ) 1 − a ( α 0 + α x x + α u 2 u 2 ) } ⋅ 2 π ∣ θ u 1 ∣ exp θ u 1 2 Z − θ 0 − θ a a − θ x x − θ u 2 u 2 − ( − 1 ) a α u 1 θ u 1 2 2 θ u 1 2 = 2 π ∣ θ u 1 ∣ 1 + exp ( − 1 ) 1 − a α 0 + α x x + α u 2 u 2 + α u 1 θ u 1 ( Z − θ 0 − θ a a − θ x x − θ u 2 u 2 ) + α u 1 2 2 θ u 1 2 = 2 π ∣ θ u 1 ∣ 1 + exp ( − 1 ) 1 − a α 0 + α x x + α u 1 θ u 1 ( Z − θ 0 − θ a a − θ x x ) + α u 1 2 2 θ u 1 2 + ( − 1 ) a α u 1 θ u 2 θ u 1 − α u 2 u 2 ,

which implies

T 1 = 2 π ∣ θ u 1 ∣ exp ( − 1 ) 1 − a α 0 + α x x + α u 1 θ u 1 ( Z − θ 0 − θ a a − θ x x ) + α u 1 2 2 θ u 1 2 ⋅ ∫ − ∞ ∞ u 2 exp ( − 1 ) a α u 1 θ u 2 θ u 1 − α u 2 u 2 − 1 2 u 2 2 d u 2 = 2 π ∣ θ u 1 ∣ exp ( − 1 ) 1 − a α 0 + α x x + α u 1 θ u 1 ( Z − θ 0 − θ a a − θ x x ) + α u 1 2 2 θ u 1 2 ⋅ 2 π ( − 1 ) a α u 1 θ u 2 θ u 1 − α u 2 exp 1 2 α u 1 θ u 2 θ u 1 − α u 2 2 = 2 π ( − 1 ) a ∣ θ u 1 ∣ exp ( − 1 ) 1 − a α 0 + α x x + α u 1 θ u 1 ( Z − θ 0 − θ a a − θ x x ) + α u 1 2 ( 1 + θ u 2 2 ) 2 θ u 1 2 − θ u 1 α u 1 α u 2 θ u 1 − 1 2 α u 2 2 ⋅ α u 1 θ u 2 θ u 1 − α u 2 ,

and

T 2 = 2 π ∣ θ u 1 ∣ exp ( − 1 ) 1 − a α 0 + α x x + α u 1 θ u 1 ( Z − θ 0 − θ a a − θ x x ) + α u 1 2 2 θ u 1 2 ⋅ ∫ − ∞ ∞ u 2 3 exp ( − 1 ) a α u 1 θ u 2 θ u 1 − α u 2 u 2 − 1 2 u 2 2 d u 2 = 2 π ∣ θ u 1 ∣ exp ( − 1 ) 1 − a α 0 + α x x + α u 1 θ u 1 ( Z − θ 0 − θ a a − θ x x ) + α u 1 2 2 θ u 1 2 ⋅ 2 π ( − 1 ) a α u 1 θ u 2 θ u 1 − α u 2 3 + α u 1 θ u 2 θ u 1 − α u 2 2 exp 1 2 α u 1 θ u 2 θ u 1 − α u 2 2 = 2 π ( − 1 ) a ∣ θ u 1 ∣ exp ( − 1 ) 1 − a α 0 + α x x + α u 1 θ u 1 ( Z − θ 0 − θ a a − θ x x ) + α u 1 2 ( 1 + θ u 2 2 ) 2 θ u 1 2 − θ u 1 α u 1 α u 2 θ u 1 − 1 2 α u 2 2 ⋅ α u 1 θ u 2 θ u 1 − α u 2 3 + α u 2 2 + α u 1 2 θ u 2 2 θ u 1 2 − 2 α u 1 α u 2 θ u 2 θ u 1 ,

using the fact that ∫ − ∞ ∞ u 2 exp − 1 2 u 2 2 d u 2 = 0 and ∫ − ∞ ∞ u 2 3 exp − 1 2 u 2 2 d u 2 = 0 (as integrals of odd functions). We then obtain

E [ g ( U ) ∣ Z = z , A = a , X = x ] = 1 ( 2 π ) 3 ∕ 2 1 − ρ 1 2 − ρ 2 2 P [ Z = z , A = a ∣ X = x ] ⋅ T 2 − 3 + α u 2 2 + α u 1 2 θ u 2 2 θ u 1 2 − 2 α u 1 α u 2 θ u 2 θ u 1 T 1 = 0 ,

for any z , a , x . However, we clearly do not have g ( U ) ≡ 0 a.s., so completeness Assumption 8(a) does not hold.

C Bias computations

C.1 Computing the (asymptotic) bias obtained through method of moments estimator under setup (19)

We will compute the asymptotic bias obtained from the method of moments solver using bridge function h ( W , A , 0 ; b ) = b 0 + b a A + b w W + b a w A W and vector function Q ( A , Z , 0 ) = ( 1 , A , Z , A Z ) T .

We define the moment restrictions H ( D i ; θ ) = { Y i − h ( W i , A i , 0 ; b ) } × Q ( A i , Z i , 0 ) Δ − ( h ( W i , 1 , 0 ; b ) − h ( W i , 0 , 0 ; b ) ) , and let m ( θ ) = E [ H ( D ; θ ) ] = lim n → ∞ 1 n ∑ i = 1 n h ( D i ; θ ) . The estimate of θ = ( b , Δ ) is given by

θ ˆ = arg min θ m T ( θ ) m ( θ ) .

C.1.1 Case of Corr ( U 1 , U 2 ) = 0 (used in main paper)

By using E [ U 1 ] = E [ U 2 ] = 0 , E [ U 1 2 ] = E [ U 2 2 ] = 1 , and E [ U 1 U 2 ] = 0 , we express the coordinates of E [ h ( D ; θ ) ] = ( m 1 , m 2 , m 3 , m 4 , m 5 ) as follows:

(A8) m 1 = − b 0 − E [ A ] b a − μ 0 b w − ( E [ A ] μ 0 + E [ A U 1 ] μ u 1 ) b a w + γ 0 + E [ A ] γ a + E [ A U 1 ] γ a u 1 ,

(A9) m 2 = − E [ A ] b 0 − E [ A ] b a − ( E [ A ] μ 0 + E [ A U 1 ] μ u 1 ) b w − ( E [ A ] μ 0 + E [ A U 1 ] μ u 1 ) b a w + ( E [ A ] ( γ 0 + γ a ) + E [ A U 1 ] ( γ u 1 + γ a u 1 ) ) + E [ A U 2 ] γ u 2 ,

(A10) m 3 = − ( θ 0 + E [ A ] θ a ) b 0 − ( E [ A ] ( θ 0 + θ a ) + E [ A U 1 ] θ u 1 ) b a − ( μ 0 θ 0 + μ u 1 θ u 1 + E [ A ] μ 0 θ a + E [ A U 1 ] μ u 1 θ a ) b w − ( E [ A ] μ 0 ( θ 0 + θ a ) + E [ A U 1 ] ( μ 0 θ u 1 + μ u 1 ( θ 0 + θ a ) ) + E [ A U 1 2 ] μ u 1 θ u 1 ) b a w + γ 0 θ 0 + γ u 1 θ u 1 + E [ A ] ( γ 0 θ a + γ a ( θ 0 + θ a ) ) + E [ A U 1 ] ( γ a θ u 1 + γ u 1 θ a + γ a u 1 ( θ 0 + θ a ) ) + E [ A U 1 2 ] γ a u 1 θ u 1 + E [ A U 2 ] γ u 2 θ a ,

(A11) m 4 = − ( E [ A ] ( θ 0 + θ a ) + E [ A U 1 ] θ u 1 ) b 0 − ( E [ A ] ( θ 0 + θ a ) + E [ A U 1 ] θ u 1 ) b a − ( E [ A ] μ 0 ( θ 0 + θ a ) + E [ A U 1 ] ( μ 0 θ u 1 + μ u 1 ( θ 0 + θ a ) ) + E [ A U 1 2 ] μ u 1 θ u 1 ) b w − ( E [ A ] μ 0 ( θ 0 + θ a ) + E [ A U 1 ] ( μ 0 θ u 1 + μ u 1 ( θ 0 + θ a ) ) + E [ A U 1 2 ] μ u 1 θ u 1 ) b a w + E [ A ] ( γ 0 + γ a ) ( θ 0 + θ a ) + E [ A U 1 ] ( ( γ 0 + γ a ) θ u 1 + ( γ u 1 + γ a u 1 ) ( θ 0 + θ a ) ) + E [ A U 1 2 ] ( γ u 1 + γ a u 1 ) θ u 1 + E [ A U 2 ] γ u 2 ( θ 0 + θ a ) + E [ A U 1 U 2 ] γ u 2 θ u 1 .

Let

(A12) R 1 = 1 ( 1 − E [ A ] ) ( 1 − E [ A U 1 2 ] ) − E [ A U 1 ] 2 ,

(A13) R 2 = ( 1 − E [ A ] ) E [ A U 1 U 2 ] + E [ A U 1 ] E [ A U 2 ] .

We obtain the estimated bridge function parameters

(A14) b ˆ 0 = γ 0 − μ 0 μ u 1 γ u 1 + μ 0 μ u 1 R 2 − E [ A U 1 ] E [ A U 1 U 2 ] − ( 1 − E [ A U 1 2 ] E [ A U 2 ] ) R 1 ⋅ γ u 2 , b ˆ w = 1 μ u 1 γ u 1 − 1 μ u 1 R 1 R 2 γ u 2 , b ˆ a w = 1 μ u 1 γ a u 1 + 1 μ u 1 R 1 R 2 + E [ A ] E [ A U 1 U 2 ] − E [ A U 1 ] E [ A U 2 ] E [ A ] E [ A U 1 U 2 ] − E [ A U 1 2 ] γ u 2 .

The estimated effect resulting from h ˆ ( W , A , 0 ; b ) is then

Δ ˆ = b ˆ a + b ˆ a w E [ W ] = b ˆ a + b ˆ a w μ 0 = γ a + R 2 E [ A U 1 ] − E [ A U 1 2 ] ( E [ A U 1 ] E [ A U 1 U 2 ] + ( 1 − E [ A U 1 2 ] ) E [ A U 2 ] ) E [ A U 1 ] 2 − E [ A ] E [ A U 1 2 ] R 1 ⋅ γ u 2 ,

which yields a bias equal to

(A15) δ = R 2 E [ A U 1 ] − E [ A U 1 2 ] ( E [ A U 1 ] E [ A U 1 U 2 ] + ( 1 − E [ A U 1 2 ] ) E [ A U 2 ] ) E [ A U 1 ] 2 − E [ A ] E [ A U 1 2 ] R 1 ⋅ γ u 2 .

We note that the expectations

(A16) E [ A ] = E [ E [ A ∣ U 1 , U 2 ] ] = E [ P [ A = 1 ∣ U 1 , U 2 ] ] = E 1 1 + exp { − α 0 − α u 1 U 1 − α u 2 U 2 } = ∫ − ∞ ∞ ∫ − ∞ ∞ 1 2 π exp − u 2 + v 2 2 d u d v 1 + exp { − α 0 − α u 1 u − α u 2 v } ,

(A17) E [ A U 1 ] = E [ E [ A U 1 ∣ U 1 , U 2 ] ] = E [ U 1 E [ A ∣ U 1 , U 2 ] ] = E U 1 1 + exp { − α 0 − α u 1 U 1 − α u 2 U 2 } = ∫ − ∞ ∞ ∫ − ∞ ∞ 1 2 π u exp − u 2 + v 2 2 d u d v 1 + exp { − α 0 − α u 1 u − α u 2 v } ,

(A18) E [ A U 2 ] = E [ E [ A U 2 ∣ U 1 , U 2 ] ] = E [ U 2 E [ A ∣ U 1 , U 2 ] ] = E U 2 1 + exp { − α 0 − α u 1 U 1 − α u 2 U 2 } = ∫ − ∞ ∞ ∫ − ∞ ∞ 1 2 π v exp − u 2 + v 2 2 d u d v 1 + exp { − α 0 − α u 1 u − α u 2 v } ,

(A19) E [ A U 1 2 ] = E [ E [ A U 1 2 ∣ U 1 , U 2 ] ] = E [ U 1 2 E [ A ∣ U 1 , U 2 ] ] = E U 1 2 1 + exp { − α 0 − α u 1 U 1 − α u 2 U 2 } = ∫ − ∞ ∞ 1 2 π u 2 exp − u 2 + v 2 2 d u d v 1 + exp { − α 0 − α u 1 u − α u 2 v } ,

(A20) E [ A U 1 U 2 ] = E [ E [ A U 1 U 2 ∣ U 1 , U 2 ] ] = E [ U 1 U 2 E [ A ∣ U 1 , U 2 ] ] = E U 1 U 2 1 + exp { − α 0 − α u 1 U 1 − α u 2 U 2 } = ∫ − ∞ ∞ 1 2 π u v exp − u 2 + v 2 2 d u d v 1 + exp { − α 0 − α u 1 u − α u 2 v } ,

cannot be computed in closed form but can be obtained numerically using software like Mathematica or Maple once we provide the values of α 0 and α u .

C.1.2 General case of Corr ( U 1 , U 2 ) = ν

Using E [ U 1 ] = E [ U 2 ] = 0 , E [ U 1 2 ] = E [ U 2 2 ] = 1 , and E [ U 1 U 2 ] = ν , the new coordinates of E [ h ( D ; θ ) ] = ( m 1 , m 2 , m 3 , m 4 , m 5 ) result from

( m 1 , m 2 , m 3 , m 4 , m 5 ) = E [ Y ] E [ A Y ] E [ Z Y ] E [ A Z Y ] − 1 E [ A ] E [ W ] E [ A W ] E [ A ] E [ A ] E [ A W ] E [ A W ] E [ Z ] E [ A Z ] E [ Z W ] E [ A Z W ] E [ A Z ] E [ A Z ] E [ A Z W ] E [ A Z W ] b .

Proceeding similarly as in the case ν = 0 , we obtain a bias equal to

δ = R 2 E [ A U 1 ] − E [ A U 1 2 ] ( E [ A U 1 ] ( E [ A U 1 U 2 ] − ν E [ A ] ) + ( 1 − E [ A U 1 2 ] E [ A U 2 ] ) ) − E [ A U 1 ] 3 ν E [ A U 1 ] 2 − E [ A ] E [ A U 1 2 ] R 1 γ u 2 = E [ A U 1 ] ( 1 − E [ A ] − E [ A U 1 2 ] ) E [ A U 1 U 2 ] + ν E [ A U 1 ] ( E [ A ] E [ A U 1 2 ] − E [ A U 1 ] 2 ) E [ A U 1 ] 2 − E [ A ] E [ A U 1 2 ] R 1 γ u 2 + ( E [ A U 1 ] 2 − ( 1 − E [ A U 1 2 ] ) E [ A U 1 2 ] ) E [ A U 2 ] E [ A U 1 ] 2 − E [ A ] E [ A U 1 2 ] R 1 γ u 2 .

C.2 Computing the (asymptotic) bias obtained through method of moments estimator under setup (21)

θ ˆ = arg min θ m T ( θ ) m ( θ ) .

C.2.1 Case of Corr ( U 1 , U 2 ) = 0 (used in main paper):

Using E [ U 1 ] = E [ U 2 ] = 0 , E [ U 1 2 ] = E [ U 2 2 ] = 1 , and E [ U 1 U 2 ] = 0 , we express the coordinates of E [ h ( D ; θ ) ] = ( m 1 , m 2 , m 3 , m 4 , m 5 ) as follows:

m 1 = − b 0 − E [ A ] b a − μ 0 b w − ( E [ A ] μ 0 + E [ A U 1 ] μ u 1 ) b a w + γ 0 + E [ A ] γ a + E [ A U 1 ] γ a u 1 , m 2 = − E [ A ] b 0 − E [ A ] b a − ( E [ A ] μ 0 + E [ A U ] μ u 1 ) b w − ( E [ A ] μ 0 + E [ A U 1 ] μ u 1 ) b a w + ( E [ A ] ( γ 0 + γ a ) + E [ A U 1 ] ( γ u 1 + γ a u 1 ) ) , m 3 = − ( θ 0 + E [ A ] θ a ) b 0 − ( E [ A ] ( θ 0 + θ a ) + E [ A U 1 ] θ u 1 ) b a − ( μ 0 θ 0 + μ u 1 θ u 1 + μ u 2 θ u 2 + E [ A ] μ 0 θ a + E [ A U 1 ] μ u 1 θ a ) b w − ( E [ A ] ( μ 0 ( θ 0 + θ a ) + μ u 2 θ u 2 ) + E [ A U 1 ] ( μ 0 θ u 1 + μ u 1 ( θ 0 + θ a ) ) + E [ A U 1 2 ] μ u 1 θ u 1 ) b a w + γ 0 θ 0 + γ u 1 θ u 1 + E [ A ] ( γ 0 θ a + γ a ( θ 0 + θ a ) ) + E [ A U 1 ] ( γ a θ u 1 + γ u 1 θ a + γ a u 1 ( θ 0 + θ a ) ) + E [ A U 1 2 ] γ a u 1 θ u 1 ,

m 4 = − ( E [ A ] ( θ 0 + θ a ) + E [ A U ] θ u 1 ) b 0 − ( E [ A ] ( θ 0 + θ a ) + E [ A U 1 ] θ u 1 ) b a − ( E [ A ] ( μ 0 ( θ 0 + θ a ) + μ u 2 θ u 2 ) + E [ A U 1 ] ( μ 0 θ u 1 + μ u 1 ( θ 0 + θ a ) ) + E [ A U 1 2 ] μ u 1 θ u 1 ) b w − ( E [ A ] ( μ 0 ( θ 0 + θ a ) + μ u 2 θ u 2 ) + E [ A U 1 ] ( μ 0 θ u 1 + μ u 1 ( θ 0 + θ a ) ) + E [ A U 1 2 ] μ u 1 θ u 1 ) b a w + E [ A ] ( γ 0 + γ a ) ( θ 0 + θ a ) + E [ A U 1 ] ( γ u 1 θ a + γ a u 1 ( θ 0 + θ a ) ) + E [ A U 1 2 ] γ a u 1 θ u 1 .

Let

S 1 = ( 1 − E [ A ] ) 2 ( 1 − E [ A ] ) ( 1 − E [ A U 1 2 ] ) − E [ A U 1 ] 2 , S 2 = E [ A ] 2 E [ A ] E [ A U 1 2 ] − E [ A U 1 ] 2 .

We obtain the estimated bridge function parameters

(A21) b ˆ 0 = γ 0 − μ 0 μ u 1 + S 1 ⋅ θ u 2 θ u 1 μ u 2 + S 1 1 − E [ A ] ⋅ θ u 2 θ u 1 μ u 2 μ u 1 + S 1 ⋅ θ u 2 θ u 1 μ u 2 γ u 1 , b ˆ a = γ a − μ 0 − E [ A U 1 ] E [ A ] S 2 ⋅ θ u 2 θ u 1 μ u 2 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ a u 1 + μ 0 + E [ A U 1 ] 1 − E [ A ] S 1 ⋅ θ u 2 θ u 1 μ u 2 μ u 1 + S 1 ⋅ θ u 2 θ u 1 μ u 2 − μ 0 − E [ A U 1 ] E [ A ] S 2 ⋅ θ u 2 θ u 1 μ u 2 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ u 1 , b ˆ w = 1 μ u 1 + S 1 ⋅ θ u 2 θ u 1 μ u 2 γ u 1 , b ˆ a w = 1 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ a u 1 − 1 μ u 1 + S 1 ⋅ θ u 2 θ u 1 μ u 2 − 1 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ u 1 .

The estimated effect resulting from h ˆ ( W , A , 0 ; b ) is then

Δ ˆ = b ˆ a + b ˆ a w E [ W ] = b ˆ a + b ˆ a w μ 0 = γ a + E [ A U 1 ] E [ A ] S 2 ⋅ θ u 2 θ u 1 μ u 2 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ a u 1 + E [ A U 1 ] 1 − E [ A ] S 1 ⋅ θ u 2 θ u 1 μ u 2 μ u 1 + S 1 ⋅ θ u 2 θ u 1 μ u 2 + E [ A U 1 ] E [ A ] S 2 ⋅ θ u 2 θ u 1 μ u 2 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ u 1 = γ a + E [ A U 1 ] E [ A ] ( 1 − E [ A ] ) θ u 2 θ u 1 μ u 2 ( 1 − E [ A ] ) S 2 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ a u 1 + E [ A ] S 1 μ u 1 + S 1 ⋅ θ u 2 θ u 1 μ u 2 + ( 1 − E [ A ] ) S 2 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ u 1 ,

which yields a bias equal to

(A22) δ = E [ A U 1 ] E [ A ] ( 1 − E [ A ] ) θ u 2 θ u 1 μ u 2 ( 1 − E [ A ] ) S 2 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ a u 1 + E [ A ] S 1 μ u 1 + S 1 ⋅ θ u 2 θ u 1 μ u 2 + ( 1 − E [ A ] ) S 2 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ u 1 .

In the particular case γ a u 1 = 0 , we obtain a bias equal to

(A23) δ = E [ A U 1 ] E [ A ] ( 1 − E [ A ] ) θ u 2 θ u 1 μ u 2 E [ A ] S 1 μ u 1 + S 1 ⋅ θ u 2 θ u 1 μ u 2 + ( 1 − E [ A ] ) S 2 μ u 1 + S 2 ⋅ θ u 2 θ u 1 μ u 2 γ u 1 .

Similarly to Proof C.1, we note that the expectations

E [ A ] = E [ E [ A ∣ U 1 ] ] = E [ P [ A = 1 ∣ U 1 ] ] = E 1 1 + exp { − α 0 − α u 1 U 1 } = ∫ − ∞ ∞ 1 2 π exp − u 2 2 1 + exp { − α 0 − α u 1 u } d u , E [ A U 1 ] = E [ E [ A U 1 ∣ U 1 ] ] = E [ U 1 E [ A ∣ U 1 ] ] = E U 1 1 + exp { − α 0 − α u U 1 } = ∫ − ∞ ∞ 1 2 π u exp − u 2 2 1 + exp { − α 0 − α u 1 u } d u , E [ A U 1 2 ] = E [ E [ A U 1 2 ∣ U 1 ] ] = E [ U 1 2 E [ A ∣ U 1 ] ] = E U 1 2 1 + exp { − α 0 − α u U 1 } = ∫ − ∞ ∞ 1 2 π u 2 exp − u 2 2 1 + exp { − α 0 − α u 1 u } d u ,

cannot be computed in closed form but can be obtained numerically using software like Mathematica or Maple once we provide the values of α 0 and α u .

C.2.2 General case of Corr ( U 1 , U 2 ) = ν :

Using E [ U 1 ] = E [ U 2 ] = 0 , E [ U 1 2 ] = E [ U 2 2 ] = 1 , and E [ U 1 U 2 ] = ν , the new coordinates of E [ h ( D ; θ ) ] = ( m 1 , m 2 , m 3 , m 4 , m 5 ) result from

Let

T 1 = E [ A U 1 2 ] E [ A U 2 ] − E [ A U 1 ] E [ A U 1 U 2 ] , T 2 = E [ A U 2 2 ] E [ A U 1 ] − E [ A U 2 ] E [ A U 1 U 2 ] , V 11 = E [ A U 1 ] 2 − E [ A ] E [ A U 1 2 ] , V 22 = E [ A U 2 ] 2 − E [ A ] E [ A U 2 2 ] , V 12 = E [ A U 1 ] E [ A U 2 ] − E [ A ] E [ A U 1 U 2 ] .

Proceeding similarly as in the case ν = 0 , we obtain an estimated effect resulting from h ˆ ( W , A , 0 ; b ) equal to

Δ ˆ = b ˆ a + b ˆ a w E [ W ] = b ˆ a + b ˆ a w μ 0 = γ a + μ u 2 θ u 1 T 1 − θ u 2 θ u 1 T 2 μ u 1 V 11 + θ u 2 θ u 1 V 12 + μ u 2 V 12 + θ u 2 θ u 1 V 22 ( γ u 1 + γ a u 1 ) + − T 1 + ( E [ A U 2 ] − ν E [ A U 1 ] ) + θ u 2 θ u 1 ( T 2 − ( E [ A U 1 ] − ν E [ A U 2 ] ) ) F γ u 1 ,

where

F = μ u 1 V 11 + E [ A U 1 2 ] + θ u 2 θ u 1 ( V 12 + E [ A U 1 U 2 ] ) − ( 1 − E [ A ] ) 1 + θ u 2 θ u 1 ν + μ u 2 V 12 + E [ A U 1 U 2 ] + θ u 2 θ u 1 ( V 22 + E [ A U 2 2 ] ) − ( 1 − E [ A ] ) ν + θ u 2 θ u 1 .

This yields a bias equal to

δ = μ u 2 θ u 1 T 1 − θ u 2 θ u 1 T 2 μ u 1 V 11 + θ u 2 θ u 1 V 12 + μ u 2 V 12 + θ u 2 θ u 1 V 22 ( γ u 1 + γ a u 1 ) + − T 1 + ( E [ A U 2 ] − ν E [ A U 1 ] ) + θ u 2 θ u 1 ( T 2 − ( E [ A U 1 ] − ν E [ A U 2 ] ) ) F γ u 1 .

As in the previous case, expectations E [ A ] , E [ A U 1 ] , and E [ A U 1 2 ] , as well as

E [ A U 2 ] = E [ E [ A U 2 ∣ U 1 ] ] = E [ E [ A U 2 ∣ U 1 , A = 1 ] P [ A = 1 ∣ U 1 ] + E [ A U 2 ∣ U 1 , A = 0 ] P [ A = 0 ∣ U 1 ] ] = E [ E [ U 2 ∣ U 1 , A = 1 ] P [ A = 1 ∣ U 1 ] ] = E [ E [ U 2 ∣ U 1 ] P [ A = 1 ∣ U 1 ] ] = E [ ν U 1 P [ A = 1 ∣ U 1 ] ] = ν E [ U 1 E [ A ∣ U 1 ] ] = ν E [ A U 1 ] ,

E [ A U 1 U 2 ] = E [ E [ A U 1 U 2 ∣ U 1 ] ] = E [ E [ A U 1 U 2 ∣ U 1 , A = 1 ] P [ A = 1 ∣ U 1 ] + E [ A U 1 U 2 ∣ U 1 , A = 0 ] P [ A = 0 ∣ U 1 ] ] = E [ E [ U 1 U 2 ∣ U 1 , A = 1 ] P [ A = 1 ∣ U 1 ] ] = E [ U 1 E [ U 2 ∣ U 1 ] P [ A = 1 ∣ U 1 ] ] = E [ ν U 1 2 P [ A = 1 ∣ U 1 ] ] = ν E [ U 1 2 E [ A ∣ U 1 ] ] = ν E [ A U 1 2 ] ,

E [ A U 2 2 ] = E [ E [ A U 2 2 ∣ U 1 ] ] = E [ E [ A U 2 2 ∣ U 1 , A = 1 ] P [ A = 1 ∣ U 1 ] + E [ A U 2 2 ∣ U 1 , A = 0 ] P [ A = 0 ∣ U 1 ] ] = E [ E [ U 2 2 ∣ U 1 , A = 1 ] P [ A = 1 ∣ U 1 ] ] = E [ ( 1 − ν 2 + ν 2 U 1 2 ) P [ A = 1 ∣ U 1 ] ] = ( 1 − ν 2 ) E [ P [ A = 1 ∣ U 1 ] ] + ν 2 E [ U 1 2 P [ A = 1 ∣ U 1 ] ] = ( 1 − ν 2 ) E [ A ] + ν 2 E [ E [ A U 1 2 ∣ U 1 ] ] = ( 1 − ν 2 ) E [ A ] + ν 2 E [ A U 1 2 ] ,

which help simplify the bias formula.

C.3 Computing the OR estimator bias under setup (19)

Let M = 1 Z W A A Z A W . By the typical formula b ˆ = ( M T M ) − 1 M T Y for the OLS estimator and the following

E [ Z ] = θ 0 + θ a E [ A ] , E [ W ] = μ 0 , E [ A Z ] = ( θ 0 + θ a ) E [ A ] + θ u 1 E [ A U 1 ] , E [ A W ] = μ 0 E [ A ] + μ u 1 E [ A U 1 ] , E [ Z 2 ] = θ 0 2 + θ u 1 2 + 1 + ( θ a 2 + 2 θ 0 θ a ) E [ A ] + 2 θ 0 θ u 1 E [ A U 1 ] , E [ W 2 ] = μ 0 2 + μ u 1 2 + 1 , E [ A Z 2 ] = ( 1 + ( θ 0 + θ a ) 2 ) E [ A ] + 2 ( θ 0 + θ a ) θ u 1 E [ A U 1 ] + θ u 1 2 E [ A U 1 2 ] E [ A W 2 ] = ( 1 + μ 0 2 ) E [ A ] + 2 μ 0 μ u 1 E [ A U 1 ] + μ u 1 2 E [ A U 1 2 ] E [ A Z W ] = ( ( θ 0 + θ a ) μ 0 + θ u 2 μ u 2 ) E [ A ] + ( ( θ 0 + θ a ) μ u 1 + θ u 1 μ 0 ) E [ A U 1 ] + θ u 1 μ u 1 E [ A U 1 2 ] E [ Y ] = γ 0 + γ a E [ A ] + γ a u 1 E [ A U 1 ] E [ Z Y ] = θ 0 γ 0 + θ u 1 γ u 1 + ( ( θ 0 + θ a ) γ a + θ a γ 0 ) E [ A ] + ( θ u 1 γ a + θ a γ u 1 + ( θ 0 + θ a ) γ a u 1 ) E [ A U 1 ] + θ u 1 γ a u 1 E [ A U 1 2 ] + θ a γ u 2 E [ A U 2 ] E [ W Y ] = μ 0 γ 0 + μ u 1 γ u 1 + μ 0 γ a E [ A ] + ( μ 0 γ a u 1 + μ u 1 γ a ) E [ A U 1 ] + μ u 1 γ a u 1 E [ A U 1 2 ] E [ A Y ] = ( γ 0 + γ a ) E [ A ] + ( γ u 1 + γ a u 1 ) E [ A U 1 ] + γ u 2 E [ A U 2 ] E [ A Z Y ] = ( θ 0 + θ a ) ( γ 0 + γ a ) E [ A ] + ( ( θ 0 + θ a ) ( γ u 1 + γ a u 1 ) + θ u 1 ( γ 0 + γ a ) ) E [ A U 1 ] + θ u 1 ( γ u 1 + γ a u 1 ) E [ A U 1 2 ] + ( θ 0 + θ a ) γ u 2 E [ A U 2 ] + θ u 1 γ u 2 E [ A U 1 U 2 ] E [ A W Y ] = μ 0 ( γ 0 + γ a ) E [ A ] + ( μ 0 ( γ u 1 + γ a u 1 ) + μ u 1 ( γ 0 + γ a ) ) E [ A U 1 ] + μ u 1 ( γ u 1 + γ a u 1 ) E [ A U 1 2 ] + μ 0 γ u 2 E [ A U 2 ] + μ u 1 γ u 2 E [ A U 1 U 2 ] ,

we obtain a linear regression estimator bias equal to

(A24) δ OR = E [ A U ] E [ A ] − ( 1 − E [ A ] ) θ a S 2 θ u 1 ( 1 + μ u 2 2 ) + E [ A U ] E [ A ] − ( 1 − E [ A ] ) θ a S 2 μ u 1 μ u 2 θ u 2 θ u 2 2 1 + θ u 1 2 S 2 ( 1 + μ u 2 2 ) + 1 + μ u 1 2 S 2 ( 1 + θ u 2 2 ) − 1 + 2 θ u 1 μ u 1 θ u 2 μ u 2 S 2 γ u 1 + ( 1 + θ u 2 2 + μ u 2 2 ) E [ A U ] E [ A ] ( 1 − E [ A ] ) ( ( 1 + θ u 2 2 + μ u 2 2 ) + 1 − E [ A U 2 ] E [ A ] ( 1 − E [ A ] ) ( θ u 1 2 ( 1 + μ u 2 2 ) + μ u 1 2 ( 1 + θ u 2 2 ) − 2 θ u 1 μ u 1 θ u 2 μ u 2 ) + θ a θ u 1 1 + μ u 2 − μ u 1 θ u 2 μ u 2 θ u 1 − E [ A U 2 ] ( 1 − E [ A U 2 ] ) E [ A ] ( 1 − E [ A ] ) + E [ A U ] 2 1 E [ A ] 2 S 1 + 1 ( 1 − E [ A ] ) 2 S 2 ⋅ ( θ u 1 2 ( 1 + μ u 2 2 ) + μ u 1 2 ( 1 + θ u 2 2 ) − 2 θ u 1 μ u 1 θ u 2 μ u 2 ) + ( 1 + θ u 2 2 + μ u 2 2 ) E [ A ] 2 ( 1 − E [ A ] ) 2 ⋅ ( E [ A ] 4 − E [ A ] 3 ( 1 + 2 E [ A U 2 ] ) + 3 E [ A ] 2 ( E [ A U 2 ] + E [ A U ] 2 ) − E [ A ] ( 3 E [ A U ] 2 + E [ A U 2 ] ) + E [ A U 2 ] ) ) ] ⋅ ∏ i = 1 , 2 1 1 + θ u 1 2 S i ( 1 + μ u 2 2 ) + 1 + μ u 1 2 S i ( 1 + θ u 2 2 ) − 1 + 2 θ u 1 μ u 1 θ u 2 μ u 2 S i γ a u 1 .

In particular, for γ a u 1 = 0 , we obtain a bias equal to

(A25) δ OR = E [ A U ] E [ A ] − ( 1 − E [ A ] ) θ a S 2 θ u 1 ( 1 + μ u 2 2 ) + E [ A U ] E [ A ] − ( 1 − E [ A ] ) θ a S 2 μ u 1 μ u 2 θ u 2 θ u 2 2 1 + θ u 1 2 S 2 ( 1 + μ u 2 2 ) + 1 + μ u 1 2 S 2 ( 1 + θ u 2 2 ) − 1 + 2 θ u 1 μ u 1 θ u 2 μ u 2 S 2 γ u 1

C.4 Computing the OR estimator bias under setup (21)

Let M = 1 Z W A A Z A W . By the typical formula b ˆ = ( M T M ) − 1 M T Y for the OLS estimator and the following

E [ Z ] = θ 0 + θ a E [ A ] , E [ W ] = μ 0 , E [ A Z ] = ( θ 0 + θ a ) E [ A ] + θ u 1 E [ A U 1 ] , E [ A W ] = μ 0 E [ A ] + μ u 1 E [ A U 1 ] , E [ Z 2 ] = θ 0 2 + θ u 1 2 + θ u 2 2 + 1 + ( θ a 2 + 2 θ 0 θ a ) E [ A ] + 2 θ 0 θ u 1 E [ A U 1 ] , E [ W 2 ] = μ 0 2 + μ u 1 2 + μ u 2 2 + 1 , E [ A Z 2 ] = ( 1 + ( θ 0 + θ a ) 2 + θ u 2 2 ) E [ A ] + 2 ( θ 0 + θ a ) θ u 1 E [ A U 1 ] + θ u 1 2 E [ A U 1 2 ] , E [ A W 2 ] = ( 1 + μ 0 2 + μ u 2 2 ) E [ A ] + 2 μ 0 μ u 1 E [ A U 1 ] + μ u 1 2 E [ A U 1 2 ] , E [ A Z W ] = ( ( θ 0 + θ a ) μ 0 + θ u 2 μ u 2 ) E [ A ] + ( ( θ 0 + θ a ) μ u 1 + θ u 1 μ 0 ) E [ A U 1 ] + θ u 1 μ u 1 E [ A U 1 2 ] , E [ Y ] = γ 0 + γ a E [ A ] + γ a u 1 E [ A U 1 ] , E [ Z Y ] = θ 0 γ 0 + θ u 1 γ u 1 + ( ( θ 0 + θ a ) γ a + θ a γ 0 ) E [ A ] + ( θ u 1 γ a + θ a γ u 1 + ( θ 0 + θ a ) γ a u 1 ) E [ A U 1 ] + θ u 1 γ a u 1 E [ A U 1 2 ] , E [ W Y ] = μ 0 γ 0 + μ u 1 γ u 1 + μ 0 γ a E [ A ] + ( μ 0 γ a u 1 + μ u 1 γ a ) E [ A U 1 ] + μ u 1 γ a u 1 E [ A U 1 2 ] , E [ A Y ] = ( γ 0 + γ a ) E [ A ] + ( γ u 1 + γ a u 1 ) E [ A U 1 ] , E [ A Z Y ] = ( θ 0 + θ a ) ( γ 0 + γ a ) E [ A ] + ( ( θ 0 + θ a ) ( γ u 1 + γ a u 1 ) + θ u 1 ( γ 0 + γ a ) ) E [ A U 1 ] + θ u 1 ( γ u 1 + γ a u 1 ) E [ A U 1 2 ] , E [ A W Y ] = μ 0 ( γ 0 + γ a ) E [ A ] + ( μ 0 ( γ u 1 + γ a u 1 ) + μ u 1 ( γ 0 + γ a ) ) E [ A U 1 ] + μ u 1 ( γ u 1 + γ a u 1 ) E [ A U 1 2 ] ,

we obtain a linear regression estimator bias equal to

(A26) δ OR = E [ A U ] E [ A ] − ( 1 − E [ A ] ) θ a S 2 θ u 1 ( 1 + μ u 2 2 ) + E [ A U ] E [ A ] − ( 1 − E [ A ] ) θ a S 2 μ u 1 μ u 2 θ u 2 θ u 2 2 1 + θ u 1 2 S 2 ( 1 + μ u 2 2 ) + 1 + μ u 1 2 S 2 ( 1 + θ u 2 2 ) − 1 + 2 θ u 1 μ u 1 θ u 2 μ u 2 S 2 γ u 1 + ( 1 + θ u 2 2 + μ u 2 2 ) E [ A U ] E [ A ] ( 1 − E [ A ] ) ( ( 1 + θ u 2 2 + μ u 2 2 ) + 1 − E [ A U 2 ] E [ A ] ( 1 − E [ A ] ) ( θ u 1 2 ( 1 + μ u 2 2 ) + μ u 1 2 ( 1 + θ u 2 2 ) − 2 θ u 1 μ u 1 θ u 2 μ u 2 ) + θ a θ u 1 1 + μ u 2 − μ u 1 θ u 2 μ u 2 θ u 1 − E [ A U 2 ] ( 1 − E [ A U 2 ] ) E [ A ] ( 1 − E [ A ] ) + E [ A U ] 2 1 E [ A ] 2 S 1 + 1 ( 1 − E [ A ] ) 2 S 2 ⋅ ( θ u 1 2 ( 1 + μ u 2 2 ) + μ u 1 2 ( 1 + θ u 2 2 ) − 2 θ u 1 μ u 1 θ u 2 μ u 2 ) + ( 1 + θ u 2 2 + μ u 2 2 ) E [ A ] 2 ( 1 − E [ A ] ) 2 ⋅ ( E [ A ] 4 − E [ A ] 3 ( 1 + 2 E [ A U 2 ] ) + 3 E [ A ] 2 ( E [ A U 2 ] + E [ A U ] 2 ) − E [ A ] ( 3 E [ A U ] 2 + E [ A U 2 ] ) + E [ A U 2 ] ) ) ] ⋅ ∏ i = 1 , 2 1 1 + θ u 1 2 S i ( 1 + μ u 2 2 ) + 1 + μ u 1 2 S i ( 1 + θ u 2 2 ) − 1 + 2 θ u 1 μ u 1 θ u 2 μ u 2 S i γ a u 1 .

In particular, for γ a u 1 = 0 , we obtain a bias equal to

(A27) δ OR = E [ A U ] E [ A ] − ( 1 − E [ A ] ) θ a S 2 θ u 1 ( 1 + μ u 2 2 ) + E [ A U ] E [ A ] − ( 1 − E [ A ] ) θ a S 2 μ u 1 μ u 2 θ u 2 θ u 2 2 1 + θ u 1 2 S 2 ( 1 + μ u 2 2 ) + 1 + μ u 1 2 S 2 ( 1 + θ u 2 2 ) − 1 + 2 θ u 1 μ u 1 θ u 2 μ u 2 S 2 γ u 1 .

C.5 Comparison of proximal and unadjusted estimator biases under setup (21)

We begin by proving that both S 1 , S 2 > 0 :

Proof that S 1 , S 2 > 0 :

We have that

∣ Cov ( A , A U ) ∣ = ∣ E [ A 2 U ] ∣ = ∣ E [ A U ] ∣ ≤ Var ( A ) Var ( A U ) = Var ( A ) E [ A U 2 ] − E [ A U ] 2

which implies E [ A U ] 2 ≤ Var ( A ) ( E [ A U 2 ] − E [ A U ] 2 ) . It follows that

E [ A ] E [ A U 2 ] ≥ E [ A ] ⋅ E [ A U ] 2 ( 1 + Var ( A ) ) Var ( A ) = E [ A ] ⋅ E [ A U ] 2 ( 1 + Var ( A ) ) E [ A ] − E [ A ] 2 = E [ A U ] 2 ( 1 + Var ( A ) ) 1 − E [ A ] ≥ E [ A U ] 2 ,

since 1 + Var ( A ) ≥ 1 and 1 − E [ A ] ∈ ( 0 , 1 ) . Thus, S 2 > 0 .

Similarly, if we consider A ¯ = 1 − A (such that A ¯ 2 = A ¯ , E [ A ¯ ] = 1 − E [ A ] , Var ( A ¯ ) = Var ( A ) , E [ A ¯ U ] = − E [ A U ] , and E [ A ¯ U 2 ] = 1 − E [ A U 2 ] ), we obtain

( 1 − E [ A ] ) ( 1 − E [ A U 2 ] ) = E [ A ¯ ] E [ A ¯ U 2 ] ≥ E [ A ¯ U ] 2 = E [ A U ] 2 .

Thus, S 1 > 0 as well.□

Taking the ratio of magnitudes for the two biases, we have

δ POR δ unadj = E [ A ] ⋅ S 1 θ u 1 μ u 1 θ u 2 μ u 2 + S 1 + ( 1 − E [ A ] ) ⋅ S 2 θ u 1 μ u 1 θ u 2 μ u 2 + S 2 .

Let f ( r ) = E [ A ] ⋅ S 1 r + S 1 + ( 1 − E [ A ] ) ⋅ S 2 r + S 2 for r ∈ ( − ∞ , − min { S 1 , S 2 } ) . We note that f ( r ) is strictly increasing in r , that lim r → − ∞ f ( r ) = 0 , and that f ( r ) = 1 has the unique solution r ∗ = − S 1 ( 1 − E [ A ] ) − S 2 E [ A ] < 0 . We consider the following four cases:

If θ u 1 μ u 1 θ u 2 μ u 2 ≥ 0 , then S 1 , S 2 > 0 imply that S i θ u 1 μ u 1 θ u 2 μ u 2 + S i ∈ ( 0 , 1 ) for i = 1 , 2 . Since E [ A ] ∈ ( 0 , 1 ) , it follows that 0 < δ POR ˆ δ unadj ˆ < 1 .
If − min { S 1 , S 2 } ≤ θ u 1 μ u 1 θ u 2 μ u 2 < 0 , then S 1 , S 2 > 0 imply that S i θ u 1 μ u 1 θ u 2 μ u 2 + S i > 1 for i = 1 , 2 . Similarly, it follows that δ POR ˆ δ unadj ˆ > 1 . In particular, if θ u 1 μ u 1 θ u 2 μ u 2 = − min { S 1 , S 2 } , the proximal estimator bias can be arbitrarily large.
If r ∗ ≤ θ u 1 μ u 1 θ u 2 μ u 2 < − min { S 1 , S 2 } , then δ POR ˆ δ unadj ˆ ≥ 1 .
If θ u 1 μ u 1 θ u 2 μ u 2 < r ∗ , then 0 ≤ δ POR ˆ δ unadj ˆ < 1 . In particular, θ u 1 μ u 1 θ u 2 μ u 2 = − ∞ implies that the proximal estimator is unbiased (as either θ u 2 = 0 or μ u 2 = 0 ).

□

C.6 Computing the proximal estimator bias under γ a u = 0 and h ( W , A , X ) = b 0 + b a A + b x T X + b w T W

We will compute the asymptotic bias obtained from the method of moments solver using bridge function h ( W , A , X ; b ) = b 0 + b a A + b w T W + b x T X and vector function Q ( A , Z , X ) = ( 1 , A , Z , X ) . We assume the general case of multidimensional U , Z , W , X with Z ∈ R m , W ∈ R n , U ∈ R p , X ∈ R q . Throughout this section, we use the shorthand E [ A U ] = ( E [ A U 1 ] , … , E [ A U p ] ) and E [ A X ] = ( E [ A X 1 ] , … , E [ A X q ] ) .

We define the moment restrictions H ( D i ; θ ) = { Y i − h ( W i , A i , X i ; b ) } × Q ( A i , Z i , X i ) Δ − ( h ( W i , 1 , X i ; b ) − h ( W i , 0 , X i ; b ) ) , and let m ( θ ) = E [ H ( D ; θ ) ] = lim n → ∞ 1 n ∑ i = 1 n h ( D i ; θ ) . The estimate of θ = ( b , Δ ) is given by

θ ˆ = arg min θ m T ( θ ) m ( θ ) .

Using E [ U i ] = 0 , ∀ i = 1 , … , p and E [ U U T ] = Σ u , as well as E [ X j ] = 0 , ∀ j = 1 , … , q , E [ X X T ] = Σ x , and E [ U X T ] = ρ , we express the coordinates of E [ h ( D ; θ ) ] = ( m 1 , m 2 , m 3 , m 4 ) with m 1 , m 2 ∈ R , m 3 ∈ R m , m 4 ∈ R q as follows:

m 1 = − b 0 − E [ A ] b a − μ 0 T b w + γ 0 + E [ A ] γ a , m 2 = − E [ A ] b 0 − E [ A ] b a − ( E [ A ] μ 0 T + E [ A U ] T μ u + E [ A X ] T μ x ) b w − E [ A X ] T b x + ( γ 0 + γ a ) E [ A ] + E [ A U ] T γ u + E [ A X ] T γ x , m 3 = − ( θ 0 + θ a E [ A ] ) b 0 − ( E [ A ] ( θ 0 + θ a ) + θ u T E [ A U ] + θ x T E [ A X ] ) b a − ( ( θ 0 + E [ A ] θ a ) μ 0 T + ( θ a E [ A U ] T + θ u T Σ u + θ x T ρ T ) μ u + ( θ a E [ A X ] T + θ x T Σ x + θ u T ρ ) μ x ) b w − ( θ a E [ A X ] T + θ u T ρ + θ x T Σ x ) b x + ( θ 0 + θ a E [ A ] ) γ 0 + ( ( θ 0 + θ a ) E [ A ] + θ u T E [ A U ] + θ x T E [ A X ] ) γ a + ( θ a E [ A U ] T + θ u T Σ u + θ x T ρ T ) γ u + ( θ a E [ A X ] T + θ u T ρ + θ x T Σ x ) γ x , m 4 = − E [ A X ] b a − ( Σ x μ x + ρ T μ u ) b w − Σ x b x + E [ A X ] γ a + Σ x γ x + ρ T γ u .

Under assumption m = n and p > m : Let us define

β = E [ A U ] T − E [ A X ] T Σ x − 1 ρ T E [ A ] ( 1 − E [ A ] ) − E [ A X ] T Σ x − 1 E [ A X ] , B = Σ u − ρ Σ x − 1 ρ T − ( E [ A U ] − ρ Σ x − 1 E [ A X ] ) ( E [ A U ] T − E [ A X ] T Σ x − 1 ρ T ) E [ A ] ( 1 − E [ A ] ) − E [ A X ] T Σ x − 1 E [ A X ] θ u .

Setting m 1 = m 2 = m 3 i = m 4 i = 0 for all i = 1 , … , m , j = 1 , … , q , we obtain solution

b 0 = γ 0 − E [ A ] β γ u − ( μ 0 T − E [ A ] β μ u ) ( B T μ u ) † B T γ u , b a = γ a + β ( I p − μ u ( B T μ u ) † B T ) γ u , b w = ( B T μ u ) † B T γ u , b x = γ x + Σ x − 1 ( ρ T − E [ A X ] β ) γ u − ( μ x + Σ x − 1 ( ρ T − E [ A X ] β ) μ u ) ( B T μ u ) † B T γ u ,

where ( B T μ u ) † denotes the Moore-Penrose inverse of B T μ u . If B T μ u has full column rank, then ( B T μ u ) † corresponds to ( B T μ u ) † = ( μ u T B B T μ u ) − 1 μ u T B γ u .

The estimated effect resulting from h ˆ ( W , A , X ; b ) is

Δ ˆ = b ˆ a = γ a + E [ A U ] T − E [ A X ] T Σ x − 1 ρ T E [ A ] ( 1 − E [ A ] ) − E [ A X ] T Σ x − 1 E [ A X ] ( I p − μ u ( B T μ u ) † B T ) γ u ,

which yields a bias equal to

δ = E [ A U ] T − E [ A X ] T Σ x − 1 ρ T E [ A ] ( 1 − E [ A ] ) − E [ A X ] T Σ x − 1 E [ A X ] ( I p − μ u ( B T μ u ) † B T ) γ u .

D Details for illustrative sensitivity analysis on real data

D.1 Extracting relationships between U -parameters from the data

We have that

Cov ( Z , A ) = θ a E [ A ] ( 1 − E [ A ] ) + θ u T E [ A U ] + θ x T E [ A X ] , Cov ( W , A ) = μ u T E [ A U ] + μ x T E [ A X ] , Cov ( X , Z ) = E [ A X ] θ a T + ρ T θ u + Σ x θ x , Cov ( X , W ) = ρ T μ u + Σ x μ x , Cov ( Z , W ) = θ a ( E [ A U ] T μ u + E [ A X ] T μ x ) + θ u T μ u + θ u T ρ μ x + θ x T ρ T μ u + θ x T Σ x μ x ,

where E [ A ] , E [ A X ] , Σ x , and the five covariance terms can be computed empirically from the data. Eliminating terms θ x and μ x , we obtain

(A28) E [ A U ] − ρ Σ x − 1 E [ A X ] = ( μ u T ) † ( Cov ( W , A ) − Cov ( W , X ) Σ x − 1 E [ A X ] ) ,

(A29) θ a = Cov ( Z , A ) − Cov ( Z , X ) Σ x − 1 E [ A X ] − θ u T ( μ u T ) † ( Cov ( W , A ) − Cov ( W , X ) Σ x − 1 E [ A X ] ) E [ A ] ( 1 − E [ A ] ) − E [ A X ] T Σ x − 1 E [ A X ] ,

(A30) μ u T ( I p − ρ Σ x − 1 ρ T ) θ u = Cov ( W , Z ) − Cov ( W , X ) Σ x − 1 Cov ( X , Z ) − ( Cov ( W , A ) − Cov ( W , X ) Σ x − 1 E [ A X ] ) θ a T ,

as well as

θ x = Σ x − 1 ( Cov ( X , Z ) − E [ A X ] θ a T − ρ T θ u ) , μ x = Σ x − 1 ( Cov ( X , W ) − ρ T μ u ) .

The aforementioned equations show that parameterizing θ u and μ u suffice towards identifying terms E [ A U ] − ρ Σ x − 1 E [ A X ] and ( I p − ρ Σ x − 1 ρ T ) in the bias formula (as long as the terms are identified via the pseudoinverses).

Moreover, we have that

Cov ( Z , Y ) = Cov ( Z , X ) γ x + Cov ( Z , A ) γ a + ( θ a E [ A U ] T + θ u T + θ x T ρ T ) γ u , Cov ( W , Y ) = Cov ( W , X ) γ x + Cov ( W , A ) γ a + ( μ x T ρ T + μ u T ) γ u , Cov ( A , Y ) = γ a E [ A ] ( 1 − E [ A ] ) + γ u T E [ A U ] + γ x T E [ A X ] , Cov ( X , Y ) = γ a E [ A X ] + ρ T γ u + Σ x γ x .

which imply

Cov ( W , Y ) − Cov ( W , X ) Σ x − 1 Cov ( X , Y ) − ( Cov ( W , A ) − Cov ( W , X ) Σ x − 1 E [ A X ] ) ( Cov ( A , Y ) − E [ A X ] T Σ x − 1 Cov ( X , Y ) ) E [ A ] ( 1 − E [ A ] ) − E [ A X ] T Σ x − 1 E [ A X ] = [ μ u T ( I p − ρ Σ x − 1 ρ T ) − ( Cov ( W , A ) − Cov ( W , X ) Σ x − 1 E [ A X ] ) ( E [ A U ] T − E [ A X ] T Σ x − 1 ρ T ) E [ A ] ( 1 − E [ A ] ) − E [ A X ] T Σ x − 1 E [ A X ] γ u .

D.2 In-depth rationale for choice of sensitivity parameters

D.2.1 Choice of distribution for p = dim ( U )

The rate of the Poisson distribution can be adjusted depending on the expected number of independent unobserved confounders. In this case, we assume the large number of observed covariates in X (i.e., dim ( X ) = 67 ) accounts for most confounding of the A − Y association, and thus, the number of unobserved U is small compared to the dimension of X .

D.2.2 Drawing ρ – the base case

Construct covariance matrix ρ = Cov ( U , X ) such that covariances between elements of U and X are of similar magnitude to covariances between elements of X , as follows:

Draw elements ρ i j from the empirical distribution of pairwise covariances { ( Σ x ) i j : 1 ≤ i < j ≤ q } .
Rescale each ρ i j such that ∑ j ∣ ρ i j ∣ < 1 for each i = 1 , … , p and ∑ j ∣ ρ j i ∣ + ∑ k ∣ Σ x ∣ k i < 1 for each i = 1 , … , q , to ensure positive semidefinite covariance matrix for U X .

We operate under the assumption that the pairwise covariances ρ i j follow roughly the same distribution as the observed covariates’ covariances in Σ x (with a slight downwards shift in magnitude), given no additional information about the nature of unobserved U .

D.2.3 Choice of θ u , l , θ u , r , μ u , l , μ u , l

In the absence of additional priors on each of the unobserved confounders U , the take element-wise intervals [ ( θ u , l ) i , j , ( θ u , r ) i , j ] to be equal to some interval [ θ l , θ r ] for all i = 1 , … , m , j = 1 , … , p . Similarly, we take intervals [ ( μ u , l ) i , j , ( μ u , r ) i , j ] to be equal to some [ μ l , μ r ] for all i = 1 , … , n , j = 1 , … , p . In other words, if e m × p , e n × p are matrices of all ones, then we take [ θ u , l , θ r , l ] = [ θ l e m × p , θ r e m × p ] , [ μ u , l , μ r , l ] = [ μ l e n × p , μ r e n × p ] .

To inform our choice of θ l , θ r , μ l , μ r , we run the following linear regressions (with intercept):

fit1: regress Z onto ( A , X ) ,
fit2: regress Z onto ( A , X , W ) ,
fit3: regress W onto X ,
fit4: regress W onto ( A , X , Z ) .

Assuming our LSEM, fit2 yields estimates for θ a (the coefficient of A ) and θ u T ( μ u T ) † (the coefficients of W ). Moreover, fit4 yields estimates for μ u T ( θ u T ) † (the coefficients of Z ) and μ u T ( θ u T ) † θ a (the coefficient of A ). The 95 % CIs for the coefficients are included in ( − 25 , 25 ) for μ u T ( θ u T ) † and ( − 1 , 1 ) for θ u T ( μ u T ) † . We then choose endpoints [ θ l , θ u ] , [ μ l , μ u ] to ensure resulting samples θ u ∈ [ θ l e m × p , θ r e m × p ] , θ u ∈ [ θ l e n × p , θ r e n × p ] are included in ( − 25 , 25 ) and ( − 1 , 1 ) , respectively.

In addition, the coefficients of X in fit1 and fit3 represent biased estimates of θ u and μ u , respectively. Assuming these coefficients are at least informative for the magnitude (in terms of powers of 10) and not values of θ u and μ u , we choose [ θ l , θ r ] = [ − 10 , 10 ] , [ μ l , μ r ] = [ − 1.5 , 1.5 ] .

In fact, our experiments show that the distribution of biases does not change significantly for fixed [ μ l , μ r ] = [ − 1.5 , 1.5 ] and different choices of interval magnitudes [ θ l , θ r ] ∈ { [ − 1.5 , 1.5 ] , [ − 5 , 5 ] , [ − 10 , 10 ] } , so we keep [ θ l , θ r ] = [ − 1.5 , 1.5 ] for the slight runtime improvement in the sampling process.

D.2.4 Bootstrapping strategy for setting E [ A U ] , γ u

We compute 500 bootstrap estimates of the covariance matrices and draw E [ A U ] and γ u from the resulting distribution of (26) and (27) evaluated at the bootstrap covariance estimates.

References

[1] Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder EL. Smoking and lung cancer: recent evidence and a discussion of some questions. Int J Epidemiol. 2009 Oct;38(5):1175–91. https://academic.oup.com/ije/article-lookup/doi/10.1093/ije/dyp289. 10.1093/ije/dyp289Suche in Google Scholar PubMed

[2] Ding P, VanderWeele TJ. Sensitivity analysis without assumptions. Epidemiology. 2016 May;27(3):368–77. 10.1097/EDE.0000000000000457Suche in Google Scholar PubMed PubMed Central

[3] Robins JM, Rotnitzky A, Scharfstein DO. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In: Halloran ME, Berry D, editors. Statistical models in epidemiology, the environment, and clinical trials. The IMA Volumes in Mathematics and its Applications. New York, NY: Springer; 2000. p. 1–94. 10.1007/978-1-4612-1284-3_1Suche in Google Scholar

[4] Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc Ser B (Methodological). 1983;45(2):212–8. https://www.jstor.org/stable/2345524. 10.1111/j.2517-6161.1983.tb01242.xSuche in Google Scholar

[5] Brookhart MA, Schneeweiss S. Preference-based instrumental variable methods for the estimation of treatment effects: assessing validity and interpreting results. Int J Biostat. 2007;3(1):Article 14. 10.2202/1557-4679.1072Suche in Google Scholar PubMed PubMed Central

[6] Rambachan A, Roth J. A more credible approach to parallel trends. Rev Econ Stud. 2023 Oct;90(5):2555–91. https://doi.org/10.1093/restud/rdad018. Suche in Google Scholar

[7] Shi X, Miao W, Tchetgen Tchetgen EJ. A selective review of negative control methods in epidemiology. arXiv:200905641 [stat]. 2020 Sep. ArXiv: 2009.05641. Available from: http://arxiv.org/abs/2009.05641. Suche in Google Scholar

[8] Tchetgen Tchetgen EJ, Ying A, Cui Y, Shi X, Miao W. An introduction to proximal causal learning. arXiv:200910982 [stat]. 2020 Sep. ArXiv: 2009.10982. Available from: http://arxiv.org/abs/2009.10982. 10.1101/2020.09.21.20198762Suche in Google Scholar

[9] Cui Y, Pu H, Shi X, Miao W, Tchetgen Tchetgen EJ. Semiparametric proximal causal inference. arXiv:201108411 [math, stat]. 2020 Nov. ArXiv: 2011.08411. Available from: http://arxiv.org/abs/2011.08411. Suche in Google Scholar

[10] Miao W, Geng Z, Tchetgen Tchetgen EJ. Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika. 2018 Dec;105(4):987–93. https://academic.oup.com/biomet/article/105/4/987/5073056. 10.1093/biomet/asy038Suche in Google Scholar PubMed PubMed Central

[11] Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC, Greenland S. Good practices for quantitative bias analysis. Int J Epidemiol. 2014 Dec;43(6):1969–85. 10.1093/ije/dyu149Suche in Google Scholar PubMed

[12] Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educat Psychol. 1974;66(5):688–701. 10.1037/h0037350Suche in Google Scholar

[13] VanderWeele TJ, Arah OA. Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology. 2011 Jan;22(1):42–52. https://journals.lww.com/00001648-201101000-00008. 10.1097/EDE.0b013e3181f74493Suche in Google Scholar PubMed PubMed Central

[14] Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period – application to control of the healthy worker survivor effect. Math Model. 1986 Jan;7(9):1393–512. https://www.sciencedirect.com/science/article/pii/0270025586900886. 10.1016/0270-0255(86)90088-6Suche in Google Scholar

[15] Miao W, Shi X, Tchetgen Tchetgen EJ. A confounding bridge approach for double negative control inference on causal effects. arXiv:180804945 [stat]. 2020 Sep. ArXiv: 1808.04945. Available from: http://arxiv.org/abs/1808.04945. Suche in Google Scholar

[16] Newey WK, Powell JL. Instrumental variable estimation of nonparametric models. Econometrica. 2003;71(5):1565–78. https://onlinelibrary.wiley.com/doi/abs/10.1111/1468-0262.00459. 10.1111/1468-0262.00459Suche in Google Scholar

[17] Shi X, Miao W, Nelson JC, Tchetgen Tchetgen EJ. Multiply robust causal inference with double negative control adjustment for categorical unmeasured confounding. arXiv; 2019. ArXiv:1808.04906 [stat]. Available from: http://arxiv.org/abs/1808.04906. Suche in Google Scholar

[18] Rosenbaum PR. Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika. 1987;74(1):13–26. https://www.jstor.org/stable/2336017. 10.1093/biomet/74.1.13Suche in Google Scholar

Received: 2023-06-03

Revised: 2023-12-20

Accepted: 2024-03-20

Published Online: 2024-06-19

This work is licensed under the Creative Commons Attribution 4.0 International License.

Artikel in diesem Heft

https://doi.org/10.1515/jci-2023-0039

Schlagwörter für diesen Artikel

proximal causal inference; sensitivity analysis; bias analysis; negative control

Creative Commons

BY 4.0