Home Sensitivity analysis for causal effects with generalized linear models
Article Open Access

Sensitivity analysis for causal effects with generalized linear models

  • Arvid Sjölander EMAIL logo , Erin E. Gabriel and Iuliana Ciocănea-Teodorescu
Published/Copyright: December 13, 2022
Become an author with De Gruyter Brill

Abstract

Residual confounding is a common source of bias in observational studies. In this article, we build upon a series of sensitivity analyses methods for residual confounding developed by Brumback et al. and Chiba whose sensitivity parameters are constructed to quantify deviation from conditional exchangeability, given measured confounders. These sensitivity parameters are combined with the observed data to produce a “bias-corrected” estimate of the causal effect of interest. We provide important generalizations of these sensitivity analyses, by allowing for arbitrary exposures and a wide range of different causal effect measures, through the specification of the target causal effect as a parameter in a generalized linear model with the arbitrary link function. We show how our generalized sensitivity analysis can be easily implemented with standard software, and how its sensitivity parameters can be calibrated against measured confounders. We demonstrate our sensitivity analysis with an application to publicly available data from a cohort study of behavior patterns and coronary heart disease.

MSC 2010: 62D20; 62J12

1 Introduction

Bias due to residual confounding is one of the main threats to the validity of causal conclusions in observational studies. A common way to deal with confounding bias is to carry out a sensitivity analysis, where the analyst is required to speculate about certain sensitivity parameters that quantify the magnitude of residual confounding. These parameters are used, together with the observed data, to reason about the magnitude of the causal effect.

The literature on such sensitivity analyses can be roughly classified along two dimensions. The first of these refers to the type of sensitivity parameters used. In particular, one may start by considering an unmeasured (set of) confounder(s) U that is sufficient for confounding control, and parametrize its effect on the exposure and the outcome. The resulting parameters are then used as sensitivity parameters. Alternatively, one may use sensitivity parameters that quantify the deviation from (conditional, given measured confounders) exchangeability. Formally, exchangeability is defined as statistical independence between the exposure and the potential outcomes that define the target causal effect [1,2]. Residual confounding leads to nonexchangeability; the stronger the confounding, the larger the deviation from exchangeability. This approach has the advantage of not requiring the analyst to explicitly consider a set of unmeasured confounders, but the resulting sensitivity analysis parameters may be more abstract and difficult to interpret. The second dimension refers to the type of inference that the analysis produces about the target causal effect. In particular, the sensitivity analysis may allow point estimation of the target causal effect, given the sensitivity parameters, or it may only produce bounds for it. How published sensitivity analysis methods fall into the four categories defined by the cross-tabulation of these two dimensions is illustrated in Table 1.

Table 1

Different types of sensitivity analyses, structured by the type of sensitivity parameters (the rows) and type of inference about the target causal effect (the columns)

Point estimate Bounds
Effect of unmeasured confounder on exposure and outcome Cornfield et al. [3] Rosenbaum [4]
Schlesselman [16] Ding and Vander Weele [6]
Rosenbaum and Rubin [17] Vander Weele and Ding [7]
Lin et al. [18] Yadlowsky et al. [5]
Imbens [19] Chernozhukov et al. [9]
Vander Weele and Arah [20]
Cinelli and Hazlett [8]
Deviation from exchangeability Brumback et al. [10] Tan [21]
Chiba [11] Kallus and Zhou [22]
Blackwell [12] Kallus et al. [23]
Franks et al. [14] Zhao et al. [24]
Scharfstein et al. [15] Jesson et al. [25]
Ciocănea-Teodorescu et al. [13]

In the analyses listed in the top-left quadrant of Table 1, the analyst parametrizes the effect of a pre-specified set of unmeasured confounders that they deem sufficient for confounding control. The estimation problem is formulated so that the target causal effect can be point estimated given assumed values of the sensitivity parameters. Arguably, this is the most conceptually straight-forward way of carrying out a sensitivity analysis, and perhaps for this reason, it was also the earliest to appear in the literature [3]. Later authors considered various scenarios and model assumptions for the exposure, the outcome and the unmeasured confounders. In contrast, in the analysis by Rosenbaum [4] (top-right quadrant of Table 1), the analyst does not assume fixed values of the sensitivity parameters; rather, they specify a threshold for the maximal effect that U may have on the exposure, when contrasting all possible levels of U . Rosenbaum [4] showed how this threshold value translates into bounds for the target causal effect. Yadlowsky et al. [5] developed several sophisticated estimators of these bounds, based on semi-parametric theory and machine learning methods. Ding and Vander Weele [6] and Vander Weele and Ding [7] developed a similar sensitivity analysis and bounds, but based on specified thresholds for both the maximal effect of U on the exposure, and the maximal effect of U on the outcome. They coined the term “ E -value” for the common threshold value of these effects required to “nullify” an observed exposure–outcome association.

The analyses by Cinelli and Hazlett [8] and Chernozhukov et al. [9] stand out from other analyses on the top row of Table 1 in an important aspect. Although the other analyses typically define the sensitivity parameters as relatively standard regression coefficients (e.g., mean differences or risk ratios), Cinelli and Hazlett [8] and Chernozhukov et al. [9] defined them in terms of the variation in the exposure and the outcome, respectively, that is explained by U , e.g., as partial R 2 coefficients. Although these measures may be perceived as somewhat more abstract than standard regression coefficients, they may also enable a more general analysis. For instance, analyses by Cinelli and Hazlett [8] and Chernozhukov et al. [9] allow for arbitrary exposures, whereas many of the other analyses require the exposure to be either binary or normally distributed.

For binary exposures, Brumback et al. [10] (bottom-left quadrant of Table 1) considered a “confounding function” that describes how the mean potential outcome depends on the exposure. They developed an analysis with sensitivity parameters that index this confounding function and are defined as differences in mean potential outcomes. Chiba [11] extended this analysis to allow for sensitivity parameters defined as ratios of means of potential outcomes. Both Brumback et al. [10] and Chiba [11] focused on causal effects defined through marginal structural models and estimated with inverse probability weighting, but Blackwell [12] showed how his sensitivity analyses can be used in combination with other models and estimation methods as well. Ciocănea-Teodorescu et al. [13] developed a related sensitivity analysis, but using a parsimonious set of sensitivity parameters defined marginally over the measured confounders, instead of conditionally as Brumback et al. [10] and Chiba [11]. Franks et al. [14] and Scharfstein et al. [15] used a somewhat different way of parametrizing the deviation from exchangeability. Instead of the confounding function, they considered a “tilting function” that describes how the probability of exposure depends on the potential outcomes. They developed an analysis with sensitivity parameters that index this tilting function and are defined as odds ratios. For both the analyses using the confounding function and the tilting function, the target causal effect can be point estimated given assumed values of their respective sensitivity parameters.

Finally, all cited authors in the bottom-right quadrant of Table 1 used the tilting function (or some transformation thereof) like Franks et al. [14] and Scharfstein et al. [15]. In the spirit of Rosenbaum [4], they developed sensitivity analyses where the analyst specifies a threshold for the tilting function, which translates into bounds for the target causal effect. They developed various parametric and semi-parametric estimators for these bounds.

In this article, we build on the work by Brumback et al. [10] and Chiba [11]. Like these authors, we use sensitivity parameters that quantify the deviation from exchangeability in terms of the confounding function. However, in our generalized sensitivity analysis, we allow for an arbitrary exposure, and sensitivity parameters defined as any contrast between counterfactual means, e.g., odds ratios. We further suggest a simple way to perform the sensitivity analysis in practice, which can be easily implemented with standard software for generalized linear models. Both Brumback et al. [10] and Chiba [11] considered scenarios where the exposure and outcome are measured at a single occasion, and scenarios where they are measured repeatedly during follow-up; we restrict attention to the former.

This article is organized as follows. We first introduce notation, definitions and assumptions that we will use throughout the article in Section 2. We next briefly review the sensitivity analyses previously proposed by Brumback et al. [10] and Chiba [11] in Section 3. Then, we present our generalized sensitivity analysis in Section 4 and an estimation method based on generalized linear models in Section 5. To help practitioners select plausible values for the sensitivity parameters, we propose a method of calibrating them against measured confounders in Section 6. We present a small simulation study, which empirically confirms the theoretical properties of our estimation and calibration methods in Section 7. Finally, we demonstrate our sensitivity analysis by re-analyzing data from a cohort study of behavior patterns and coronary heart disease in Section 8. These data are publicly available, which enables the reader to replicate and elaborate on our analysis.

2 Notation, definitions and assumptions

Let X and Y be the exposure and outcome of interest, and let Z be a set of measured confounders that we wish to control for in the analysis. We allow for these variables to be either continuous, discrete or (possibly multilevel) categorical. To simplify notation, we use p ( ) for both probabilities and probability densities. We assume that data consist of n independent and identically distributed observations from p ( Z , X , Y ) . We let ( Z i , X i , Y i ) be the observation for subject i , for i = 1 , , n , and suppress the index i when not needed.

Let Y ( x ) be the potential outcome if X were, possibly contrary to fact, set to level x [1,2,26]. We define the target causal effect as some contrast between the mean potential outcomes Y ( x ) for different values of x . For binary exposures such contrasts include, for instance, the conditional (on Z ) mean difference E { Y ( 1 ) Z } E { Y ( 0 ) Z } or the marginal (over Z ) odds ratio [ p { Y ( 1 ) = 1 } / p { Y ( 1 ) = 0 } ] / [ p { Y ( 0 ) = 1 } / p { Y ( 0 ) = 0 } ] . This formulation of the target causal effect is very general and entirely nonparametric. In Section 5, we suggest an estimation method for the target effect based on parametric models.

We make the usual consistency and positivity assumptions. The consistency assumption states that the potential outcome Y ( x ) is equal to the factual outcome Y whenever the exposure X is factually equal to x :

(1) X = x Y ( x ) = Y .

The positivity assumption states that, for each Z = z with nonzero probability, there is a nonzero probability of X = x :

(2) p ( X = x Z = z ) > 0 for all z with p ( Z = z ) > 0 .

When there is no residual confounding, the potential outcome Y ( x ) is conditionally independent of the exposure X , given Z :

(3) Y ( x ) X Z ,

which is referred to as conditional exchangeability, given Z [1,2]. Under (1)–(3), both conditional and marginal causal effects are identifiable from the joint distribution p ( Z , X , Y ) . Specifically, E { Y ( x ) Z } is identifiable through

E { Y ( x ) Z } = E { Y ( x ) X = x , Z } = E ( Y X = x , Z ) ,

where the first equality follows from (2) and (3), and the second follows from (1). It follows from the law of total probability that E { Y ( x ) } is identifiable through

(4) E { Y ( x ) } = E { E ( Y X = x , Z ) } ,

where the outer expectation is taken over the marginal distribution of Z . When there is residual confounding, the relation in (3) does not generally hold, and neither conditional nor marginal causal effects are generally identifiable.

3 Previously proposed sensitivity analyses

To assess the sensitivity of observed exposure–outcome associations to residual confounding, in settings where the exposure is binary, Brumback et al. [10] proposed to use the “confounding function”

(5) c ( x , Z ) = η [ E { Y ( x ) X = x , Z } ] η [ E { Y ( x ) X = 1 x , Z } ] ,

with η being the identity link. Brumback et al. [10] further proposed to parametrize the confounding function with a (scalar or vector) sensitivity parameter α that measures the deviation from conditional exchangeability; the larger the magnitude of α , the stronger the deviation from (3). Conversely, under conditional exchangeability, we require that α = 0 . A simple example of such a parametrization is the function c ( x , Z ; α ) = α ( 2 x 1 ) , for which E { Y ( 1 ) X = 1 , Z } E { Y ( 1 ) X = 0 , Z } = [ E { Y ( 0 ) X = 0 , Z } E { Y ( 0 ) X = 1 , Z } ] = α . When α > 0 , both mean potential outcomes are higher for those factually exposed than for those factually nonexposed; thus, for this choice of confounding function, the residual confounding biases the observed exposure-outcome association upwards. Conversely, when α < 0 , the residual confounding biases the observed association downwards. A slightly more complex function allows for the deviation from exchangeability to depend on the levels of the measured confounders, e.g., by including an interaction term between ( 2 x 1 ) and Z as c ( x , Z ; α ) = α 0 ( 2 x 1 ) + α 1 ( 2 x 1 ) Z . We refer readers to the study by Brumback et al. [10] for more examples and discussion of possible confounding functions.

Given c ( x , Z ; α ) , Brumback et al. [10] proposed to construct a “bias-corrected” outcome

(6) Y i , α = Y i c ( X i , Z i ; α ) p ( X = 1 X i Z i )

for each subject i . They showed that this correction removes the confounding bias, in the sense that E ( Y i , α X i = x , Z i ) = E { Y ( x ) Z i } . This suggests that a sensitivity analysis can be carried out by first using subject matter knowledge to specify the confounding function c ( x , Z ; α ) and a set of plausible values for α ; then by using the conditional (given Z ) association between Y i , α and X i as an estimate of the target causal effect for each value of α . Chiba [11] showed that an analogous sensitivity analysis can be carried out by letting η in (5) be the log link and defining the bias-corrected outcome Y i , α as follows:

(7) Y i , α = Y i [ p ( X = X i Z i ) + exp { c ( x , Z ; α ) } p ( X = 1 X i Z i ) ] ,

for which we again have that E ( Y i , α X i = x , Z i ) = E { Y ( x ) Z i } . Computing the bias-corrected outcomes proposed by Brumback et al. [10] and Chiba [11] requires the conditional distribution p ( X Z ) . This is usually not known but can be estimated from data, e.g., with a logistic regression model. Both Brumback et al. [10] and Chiba [11] focused on causal effects defined through marginal structural models and estimated with inverse probability weighting, but Blackwell [12] noted that the sensitivity analysis can also be carried out with other models and methods, such as ordinary outcome regression or matching.

4 Generalized sensitivity analysis

To accommodate arbitrary exposures and link functions, we propose to generalize the confounding function in (5) as follows:

(8) c ( x , x , Z ) = η [ E { Y ( x ) X = x , Z } ] η [ E { Y ( x ) X = x , Z } ] ,

where η is an arbitrary smooth link function, e.g., the identity, log or logit link. Like the simpler confounding function in (5), we propose to parametrize it with a (scalar or vector) sensitivity parameter α that measures the deviation from conditional exchangeability (3) and is equal to 0 when conditional exchangeability holds. In addition, the parametrization is such that c ( x , x , Z ; α ) = 0 for all ( x , Z ) when α = 0 . In contrast to the simpler confounding function in (5), the generalized confounding function in (8) allows for any measure of deviation that can be defined as a difference between transformed (by the link function η ) means, and it allows for the exposure to be (possibly multilevel) categorical or continuous.

A straightforward generalization of the simple confounding function c ( x , Z ; α ) = α ( 2 x 1 ) for binary exposures is the function c ( x , x , Z ; α ) = α ( x x ) . When α > 0 , this function assumes that the (transformed by η ) mean potential outcome increases linearly with the factual exposure level, and it reduces to c ( x , Z ; α ) = α ( 2 x 1 ) when X is binary. For instance, suppose that Y is binary, η is the logit link and α = 0.1 . Then, the confounding function c ( x , x , Z ; α ) = α ( x x ) implies that the log odds difference logit [ p { Y ( x ) = 1 X = x , Z } ] logit [ p { Y ( x ) = 1 X = x , Z } ] increases with 0.1 units for each unit increase in the exposure difference x x . More complex confounding functions can be modeled with, for instance, splines or polynomials. One may also let the deviation from conditional exchangeability depend on levels of the measured confounders, e.g., by including an interaction term between ( x x ) and Z as c ( x , Z ; α ) = α 0 ( x x ) + α 1 ( x x ) Z . When the exposure is categorical with ordered levels x 1 , , x J , one may, for instance, parametrize the confounding function as c ( x j , x j , Z ; α ) = α ( j j ) . When the exposure levels are not ordered, it may be difficult to defend such a parsimonious model for the confounding function, and a realistic model may need to include (at least) one parameter for each combination ( x j , x j ) . As the dimension of the sensitivity parameter increases, it also becomes increasingly more difficult to suggest plausible values for it. Thus, we conjecture that, for numeric (e.g., binary and continuous) exposures, many analysts will in practice be content with the simple confounding function c ( x , x , Z ; α ) = α ( x x ) , which only requires specification of a scalar sensitivity parameter α .

Given c ( x , x , Z ; α ) , E { Y ( x ) Z } is identifiable through

(9) E { Y ( x ) Z } = E [ E { Y ( x ) X , Z } Z ] = E { η 1 ( η [ E { Y ( x ) X = x , Z } ] c ( x , X , Z ; α ) ) Z } = E ( η 1 [ η { E ( Y X = x , Z ) } c ( x , X , Z ; α ) ] Z ) ,

where the first equality follows from the law of total probability, the second from positivity (2) and the definition of c ( x , X , Z ; α ) and the third from consistency (1). The outer expectation on the right-hand side of (9) is taken over the conditional distribution p ( X Z ) . The relation in (9) suggests that a sensitivity analysis can be conducted as shown in the studies by Brumback et al. [10] and Chiba [11], by first using subject matter knowledge to specify the confounding function c ( x , x , Z ; α ) and a set of plausible parameter values for α , and then estimating the target causal effect for each value of α in this set.

Allowing for an arbitrary link function η is important, as it enables the analyst to define the confounding function on the same scale as the target causal effect, which in most situations would be perceived as natural. Furthermore, we show in Appendix A that, for binary outcomes, the logit link has an important advantage over the identity and log links, in that it makes the confounding function variation independent of the observed data distribution p ( Z , X , Y ) ; that is, all values of c ( x , x , Z ; α ) are logically compatible with all values of p ( Z , X , Y ) . This has two important consequences [14,27]. First, it enables separation of the analysis into the modeling of p ( Z , X , Y ) and the choice of confounding function, thus effectively separating the assessment of uncertainty due to sampling variability and the uncertainty due to unmeasured confounding. Second, regardless of whether the value for α is determined by subject matter knowledge or guided by calibration against measured confounders (see Section 6), the analyst does not need to worry about potential conflicts between the chosen value for α and the observed data distribution. We show in Section 8 how this property of the logit link plays out with real data, and how the absence of this property for the log link makes inference more challenging.

5 Estimation and asymptotic inference

Suppose that the confounding function c ( x , x , Z ; α ) and the sensitivity parameter α are fixed. If both X and Z are low dimensional (e.g., binary), then the identifying relation in (9) can be used to estimate E { Y ( x ) Z } nonparametrically by replacing the true mean E ( Y X = x , Z ) with the corresponding sample mean and taking the outer expectation over the conditional distribution of X given Z in the sample. To handle more general situations, we propose an estimation method based on parametric models.

We assume that the target causal effect can be expressed through the conditional (on Z ) causal model

(10) η [ E { Y ( x ) Z } ] = ψ T g ( x , Z )

or through the marginal (over Z ) causal model

(11) η [ E { Y ( x ) } ] = ψ T g ( x ) .

In these models, η is an appropriate link function, e.g., the identity, log or logit link; ψ is a column vector of parameters and g is a regression function with the same dimension as ψ . In principle, we may allow for the link function to be different in the causal models (10) and (11) than in the confounding function (8), but we suspect that these will usually be the same in practice. A simple example is when g in model (10) is linear in x and Z : η [ E { Y ( x ) Z } ] = ψ 0 + ψ 1 x + ψ 2 Z . In this model, the target causal effect is the mean counterfactual contrast ψ 1 = η [ E { Y ( x + 1 ) Z } ] η [ E { Y ( x ) Z } ] , which measures the conditional (on Z ) effect of increasing the exposure with 1 unit on the (transformed by η ) mean potential outcome. The formulations in (10) and (11) cover more complex models as well, including, for instance, splines, polynomials and interaction terms. Neither the conditional model (10) nor the marginal model (11) makes any distributional assumptions, apart from their respective mean specifications.

To address the fact that the potential outcome in regression models (10) and (11) is incompletely observed, we make use of relation (9). For this, we first need to estimate the mean E ( Y X = x , Z ) . Together with our choice of c ( x , x , Z ; α ) and α , this will allow us to compute a “bias-corrected” outcome and fit the causal model with standard software for generalized linear models. To estimate the mean E ( Y X = x , Z ) in relation (9), we use a “nuisance model” of the form

(12) η { E ( Y X , Z ) } = β T h ( X , Z ) ,

where β is a column vector of parameters and h ( X , Z ) is a regression function with the same dimension as β . This model can be fitted with standard methods (e.g., maximum likelihood) to produce an estimate β ˆ . Since the outcome nuisance model is not of interest per se, there is no need to keep this model simple to have a transparent interpretation of β . Rather, it would be typically advantageous to use as flexible model as possible, to reduce the risk of bias in the estimated target causal effect due to model misspecification. When both X and Z are low dimensional (e.g., binary), the outcome nuisance model may even be saturated.

The relation in (9) contains the exposure as both a fixed constant x and a random variable X ; the former corresponds to the counterfactual scenario where everybody receives exposure level x , whereas the latter corresponds to the factual scenario where the exposure varies randomly across subjects. In practice, these scenarios cannot be observed simultaneously. However, we can “simulate” the counterfactual scenario by artificially assigning a fixed exposure level to each subject and use the postulated confounding function to compute the potential outcome under this scenario, thereby being able to “observe” both scenarios simultaneously. This intuition motivates us to “augment” the data used to fit the models in (10) and (11), whereby, for each subject i , we generate multiple realizations of the outcome, corresponding to different values of x . More precisely, for each subject i , we generate a random draw of an auxiliary multivariate variable V i = ( V i 1 , , V i K ) T for each subject i from a pre-specified distribution independently of X . We discuss the choice of distribution below. For each subject i and observation V i k , we define the bias-corrected outcome

(13) Y i k , α , β ˆ = η 1 { β ˆ T h ( V i k , Z i ) c ( V i k , X i , Z i ; α ) } .

Thus, for each subject i , we generate K bias-corrected outcomes. The term β ˆ T h ( V i k , Z i ) is a prediction of the transformed mean η { E ( Y X = V i k , Z i ) } and can easily be obtained from standard software after having fitted the outcome nuisance model (12). A similar data augmentation technique was used by ref. [28], for estimation of natural direct and indirect effects. We show in Appendix B that the bias-correction in (13) removes the confounding bias, in the sense that E ( Y i k , α , β V i k = x , Z i ) = E { Y ( x ) Z i } .

For fixed α and β ˆ , we estimate the parameter vector ψ in the conditional model (10) as the solution to the equation system:

(14) i = 1 n H i ( α , β ˆ , ψ ) = 0 ,

where H i ( α , β ˆ , ψ ) = k = 1 K H i k ( α , β ˆ , ψ ) and

(15) H i k ( α , β ˆ , ψ ) = g ( V i k , Z i ) [ Y i k , α , β ˆ η 1 { ψ T g ( V i k , Z i ) } ] .

Estimation of ψ in the marginal model (11) is carried out with the same sets of equations, but with g ( V i k , Z i ) replaced by g ( V i k ) . We show in Appendix B that the estimating equation system is unbiased, i.e., that E { H i ( α , β , ψ ) } = 0 at the true value of ( α , β , ψ ) . It follows from the standard theory for M-estimators [29] that ψ ˆ is consistent and asymptotically normal, provided that the confounding function c ( x , x , Z ; α ) is correctly specified with correct value of the sensitivity parameter α and that both the causal model (10) (or (11)) and the outcome nuisance model (12) are correct.

The equation system in (14) is generally nonlinear and has one equation for each element in ψ . Thus, if ψ is high dimensional, it may be hard and/or time consuming to find a solution by attempting to solve the equation system explicitly. Fortunately, a solution can be easily obtained by noting that the estimating function H i k ( α , β ˆ , ψ ) is identical to the maximum likelihood score function for a generalized linear model of the form

(16) η { E ( Y i k , α , β ˆ V i k , Z i ) } = ψ T g ( V i k , Z i )

with canonical link function η . Thus, once the bias-corrected outcomes have been computed, ψ can be estimated by fitting the model in (16) with standard software for generalized linear models. Such software typically maximize a scalar likelihood function rather than solving a potentially large set of score equations, and are usually highly optimized. Thus, this estimation method is expected to be quick and stable even for large data sets and high dimensional parameter vectors ψ .

The default standard errors obtained from the software are generally not correct, since they ignore the uncertainty in β ˆ and the fact that the K score contributions from each subject are correlated. This problem can be solved by using bootstrap standard errors, provided that the outcome nuisance model in (12) is re-fitted and the bias-corrected outcomes are re-computed for each bootstrap sample. Alternatively, analytic standard errors can be obtained with theory for M-estimators. Specifically, define θ = ( β T , ψ T ) T and the stacked estimating function

U i ( θ ) = S i ( β ) H i ( α , β , ψ ) ,

where S i ( β ) is the maximum likelihood score contribution from the outcome nuisance model for subject i . It follows from the standard theory for M-estimators [29] that the asymptotic variance for θ ˆ is given by

(17) n 1 E d U ( θ ) d θ T 1 var { U ( θ ) } E d U ( θ ) d θ T 1 T .

An estimate of the asymptotic variance is obtained by replacing the true value of θ in (17) with the estimate θ ˆ and the population moments with their sample counterparts.

Consistency of ψ ˆ does not require any particular auxiliary variable distribution p ( V Z ) ; in particular, it does not require that p ( V i k = x Z ) = p ( X = x Z ) . However, the asymptotic variance of ψ ˆ may depend on the auxiliary variable distribution, with some distributions giving more efficient estimators. For instance, in analogy with ordinary linear regression, we would expect that the estimated “regression slope” ψ ˆ is less stable if the “regressors” V i 1 , , V i K are close to each other than if they are spread out. At the extreme end the “regressors” are all equal, V i 1 = , , = V i K = x , say, in which case, it is clearly not possible to estimate E { Y ( x ) Z } for any other value than x , so that the estimated “regression slope” ψ ˆ has infinite variance. When X is categorical with levels x 1 , , x J , we conjecture that a reasonable efficiency may be obtained by setting K = J and V i k = x k for k = 1 , , J . When X is continuous with quartiles q 1 , q 2 , q 3 , we may set K = 3 and V i k = q k for k = 1 , , 3 .

6 Calibration of the sensitivity parameter α

The technical definition of conditional exchangeability in (3) is quite abstract, and it may be hard, even for a subject matter expert, to determine how the impact of unmeasured confounders translates into particular values of the sensitivity parameter α , for a chosen parametric form of the confounding function (8). To aid in this process, we propose a method for calibration of α , which uses information about the measured confounders Z , and is applicable when the confounding function c ( x , x , Z ; α ) of interest is assumed not to depend on Z , e.g., c ( x , x , Z ; α ) = α ( x x ) .

To this end, for any disjoint sets of confounders W 1 and W 2 , we first define the “extended” confounding function for W 2 , given W 1 , as follows:

(18) d W 2 ( x , x , W 1 ) = η { E ( Y X = x , W 1 ) } η [ E { E ( Y X = x , W 1 , W 2 ) X = x , W 1 } ] ,

where the outer expectation in the second term on the right-hand side is taken over the conditional distribution of W 2 , given ( X = x , W 1 ) . The purpose of this function is to quantity the potential for W 2 to further confound the association between X and Y , once W 1 has been controlled for. If either W 2 is conditionally independent of X , given W 1 , or conditionally independent of Y , given ( X , W 1 ) , then we would expect that W 2 has no confounding potential, given W 1 . Indeed, it can be shown (see Appendix C) that

(19) W 2 X W 1 d W 2 ( x , x , W 1 ) = 0

and

(20) W 2 Y ( X , W 1 ) d W 2 ( x , x , W 1 ) = 0 .

Conversely, the stronger the deviations from the conditional independencies in (19) and (20), the larger the magnitude of d W 2 ( x , x , W 1 ) we would generally expect. The extended confounding function has a simple relation to the ordinary confounding function in (8). If W 1 and W 2 are two disjoint sets of confounders requiring control to achieve exchangeability, then it can be shown (see Appendix C) that the extended confounding function for W 2 , given W 1 , is identical to the ordinary confounding function given W 1 :

Y ( x ) X ( W 1 , W 2 ) d W 2 ( x , x , W 1 ) = c ( x , x , W 1 ) .

These relations motivate the following calibration strategy. Let U be an additional set of confounders requiring control to achieve exchangeability, given Z , so that d U ( x , x , Z ) = c ( x , x , Z ) . We first select a subset Z of Z for which the need for control is fairly certain; this subset can be the full set of all measured confounders, i.e., Z = Z . We then consider the extended confounding function d Z ( x , x , ) for Z given the empty set; this function measures the “marginal” confounding potential of Z . We further assume that this function has the same parametric form as the ordinary confounding function c ( x , x , Z ; α ) for U , given Z , that we are ultimately interested in, and that c ( x , x , Z ; α ) does not depend on Z. For example, for a confounding function given by c ( x , x , Z ; α ) = α ( x x ) , we would consider the extended confounding function d Z ( x , x , ; α ) = α ( x x ) . Since Z is in fact measured, we can estimate α , thereby obtaining a concrete reference point for the sensitivity parameter α of main interest. For instance, if we believe that the confounding potential of U , given Z , is no larger in magnitude than the marginal confounding potential of Z , then we would consider values of α in the range ( α , α ) as plausible.

To estimate α , we use the fact that

(21) E ( η 1 [ η { E ( Y X = x ) } d Z ( x , X , ; α ) ] ) = E [ E { E ( Y X = x , Z ) X } ] = E { E ( Y X = x , Z ) } ,

where the first equality follows from (18) and the second from the law of total probability. This suggests that α can be estimated as the value that minimizes the discrepancy between the left-hand side and the right-hand side of (21). Specifically, we propose the following five-step estimation strategy:

  1. Fit regression models E ( Y X , Z ; ξ ) and E ( Y X ; δ ) are ideally as flexible as possible, e.g., saturated for binary X and Z .

  2. Let X be the set of possible values for X . If X is continuous, we suggest to provisionally categorize X to obtain a finite set X . For each x in X , use the regression model E ( Y X , Z ; ξ ) to create an estimate of E { E ( Y X = x , Z ) } , which we denote by μ ˆ 1 ( x ; ξ ˆ ) . This is done with regression standardization [30], by using the model to create a prediction E ( Y X = x , Z i ; ξ ˆ ) for each subject i and then averaging these predictions:

    μ ˆ 1 ( x ; ξ ˆ ) = i = 1 n E ( Y X = x , Z i ; ξ ˆ ) / n .

  3. For each x in X , and a fixed value α , use the regression model E ( Y X ; δ ) to create another estimate of E { E ( Y X = x , Z ) } , which we denote by μ ˆ 2 ( x ; δ ˆ , α ) . This is done by using the model to create an estimate E ( Y X = x ; δ ˆ ) and then combining this with the extended confounding function d Z ( x , X , ; α ) to create a prediction E { E ( Y X = x , Z ) X i ; δ ˆ , α } = η 1 [ η { E ( Y X = x ; δ ˆ ) } d Z ( x , X i , ; α ) ] for each subject i , and then finally averaging these predictions:

    μ ˆ 2 ( x ; δ ˆ , α ) = i = 1 n E { E ( Y X = x , Z ) X i ; δ ˆ , α } / n .

    Note that μ ˆ 2 ( x ; δ ˆ , α ) depends on the value of α through the extended confounding function d Z ( x , X i , ; α ) .

  4. Define the estimated squared distance D ˆ ( x ; ξ ˆ , δ ˆ , α ) = { μ ˆ 1 ( x ; ξ ˆ ) μ ˆ 2 ( x ; δ ˆ , α ) } 2 and compute the estimated sum of squared distance:

    D ˆ ( ξ ˆ , δ ˆ , α ) = x X D ˆ ( x ; ξ ˆ , δ ˆ , α ) .

  5. Estimate α as the value that minimizes the estimated sum of squared distance:

    α ˆ = argmin α D ˆ ( ξ ˆ , δ ˆ , α ) .

We show in Appendix D that the obtained estimate α ˆ is consistent, provided that the models for d Z ( x , x , ; α ) , E ( Y X , Z ) and E ( Y X ) are correct.

We note that, in contrast to the ordinary confounding function c ( x , x , Z ) , the extended confounding function d Z ( x , x , ) is nonparametrically identifiable from the observed data. However, it is only nonparametrically estimable if Z is categorical. Thus, if Z is categorical, we may informally verify the parametric model d Z ( x , x , ; α ) by comparing with a fully saturated model for d Z ( x , x , ) ; a large agreement between these indicates that the parametric model fits well to the data.

7 Simulation

We carry out a small simulation study to confirm the theoretical properties of our estimation and calibration methods, in the ideal scenario when all model assumptions are correct. We generate data from the model

(22) U Ber ( 0.5 ) Z U Ber ( 0.5 ) X U , Z Ber { expit ( ν 0 + ν 1 U + ν 2 Z ) } Y X , U , Z Ber { expit ( τ 0 + τ 1 U + τ 2 Z + ψ X ) } ,

where U and Z represent an unmeasured and measured confounder, respectively. We assume conditional exchangeability given ( U , Z ) , so that the last row of (22) can be interpreted as the conditional causal model

(23) logit [ p { Y ( x ) = 1 U , Z } ] = τ 0 + τ 1 U + τ 2 Z + ψ x .

Thus, ψ is the conditional causal odds ratio, given ( U , Z ) , and the extended confounding function d U ( x , x , Z ) is equal to the ordinary confounding function c ( x , x , Z ) . We set ψ = 0 , thus simulating under the causal null. For a fixed value of α , we generate data so that d Z ( x , x , ; α ) = α ( x x ) and d U ( x , x , Z ) = c ( x , x , Z ; α ) = α ( x x ) for Z { 0 , 1 } , with α = α , so that the confounding potential of U , given Z , is equal to the marginal confounding potential of Z . These relations imply the following six constraints on the observed data distribution:

(24) d Z ( 1 , 0 , ) = α d Z ( 0 , 1 , ) = α d U ( 1 , 0 , 0 ) = α d U ( 1 , 0 , 1 ) = α d U ( 0 , 1 , 0 ) = α d U ( 0 , 1 , 1 ) = α .

For a fixed value of α , we use these constraints to determine the six parameters ( ν 0 , ν 1 , ν 2 , τ 0 , τ 1 , and τ 2 ); we provide details in Appendix E. We note that the obtained parameter values may not be unique; there could possibly exist different sets of parameter values that satisfy the constraints for a given value of α .

For each α { 0.05 , 01 , 0.25 } , we generate 300 samples of size n = 1,000 from the model mentioned earlier. For each sample, we fit the conditional causal model logit [ p { Y ( x ) = 1 Z } ] = ψ 0 + ψ 1 x + ψ 2 Z with the estimation methods outlined in Section 5. This model is correctly specified and saturated under our data-generating mechanism, with ψ 1 = ψ = 0 . We fit the model twice: first using the correct value of α , then estimating α according to the calibration scheme outlined in Section 6 and using the estimated value of α in the estimation of ψ . In this calibration, we used Z = Z and saturated models for E ( Y X ) and E ( Y X , Z ) , and correct models for the (extended) confounding functions. We estimate the standard error with both the sandwich formula and a nonparametric bootstrap, using 100 bootstrap samples for each original sample. In the bootstrap, we re-estimate α for each bootstrap sample.

Table 2 shows the regression coefficients ( ν 0 , ν 1 , ν 2 , τ 0 , τ 1 , and τ 2 ) for each value of α . Table 3 shows the mean (over the 300 samples) estimate ψ ˆ together with its mean sandwich standard error, mean bootstrap standard error and its empirical (over the 300 samples) standard deviation, when using the true value of α in the estimation. Table 4 shows the same statistics when using the estimated values of α ; this table also shows the mean estimate α ˆ .

Table 2

Regression coefficients in the data generating model (22)

α ν 0 ν 1 ν 2 τ 0 τ 1 τ 2
0.05 0.76 0.46 0.49 0.87 0.44 0.42
0.1 0.55 0.59 0.49 0.94 0.69 0.92
0.5 0.92 1.15 1.34 1.30 0.93 0.92
Table 3

Simulation results when using the true value of α in the estimation of ψ . Mean estimate ψ ˆ together with its mean sandwich standard error, mean bootstrap standard error, and its empirical standard deviation

α ψ ˆ se.sandwich ( ψ ˆ ) se.boot ( ψ ˆ ) sd ( ψ ˆ )
0.05 0.01 0.13 0.13 0.13
0.1 0.01 0.13 0.13 0.13
0.5 0.00 0.14 0.14 0.14
Table 4

Simulation results when using the estimated value of α in the estimation of ψ . Mean estimate α ˆ , and mean estimate ψ ˆ together with its mean sandwich standard error, mean bootstrap standard error, and its empirical standard deviation

α α ˆ ψ ˆ se.sandwich ( ψ ˆ ) se.boot ( ψ ˆ ) sd ( ψ ˆ )
0.05 0.05 0.01 0.13 0.14 0.14
0.1 0.10 0.01 0.13 0.14 0.14
0.25 0.25 0.01 0.14 0.15 0.15

We observe that ψ ˆ is virtually unbiased ( 0 ) for all values of α , both when using the true value and the estimated value of α in the estimation of ψ . We further observe that α ˆ is virtually unbiased as well. The sandwich and bootstrap standard errors of ψ ˆ agree well with the empirical standard deviation of ψ ˆ when using the true value of α . When using the estimated value of α the sandwich standard error is slightly too small; this is because it does not account for the sampling variability in α ˆ .

8 Illustration

Chiba [11] analyzed data from the Western Collaborative Group Study (WCGS); a cohort study that aimed to assess risk factors for CHD. In this section, we illustrate our sensitivity analysis by applying it to the WCGS data. As a comparison, we provide a sensitivity analysis conducted using the method presented by Franks et al. [14] in Appendix F. The R-code for the analysis is provided in Appendix G. The WCGS data are publicly available in the R package faraway, which enables the reader to replicate and elaborate on our analyses.

The cohort consists of 3,154 middle-aged men employed at 10 California companies in 1960. Several covariates were measured at baseline, including behavior pattern classified as type A (aggressive and ambitious) or type B (calm and relaxed), age, body mass index, systolic and diastolic blood pressure, and smoking (yes/no). The subjects were followed until death or end of 1969, whichever came first. During follow-up, all diagnoses of CHD were recorded. In line with Chiba [11], we ignore complications due to censoring by death and consider CHD at end of follow-up as a binary outcome: Y = 1 for CHD and Y = 0 for no CHD.

8.1 Analysis with a binary exposure

Following Chiba [11], we start by considering the behavior pattern as the exposure ( X = 1 for type A and X = 0 for type B) and the other measured covariates as potential confounders Z . Chiba [11] aimed to estimate the marginal causal risk ratio for behavior pattern on CHD, whereas we focus on the marginal causal odds ratio. Specifically, we consider the marginal causal logistic model

(25) logit { E ( Y ( x ) ) } = ψ 0 + ψ 1 x ,

where the causal log odds ratio ψ 1 is of main interest. Since X is binary, this model is saturated.

To estimate ψ while assessing its sensitivity to unmeasured confounding, we use the confounding function c ( x , x , Z ; α ) = α ( x x ) , with η being the logit link. Chiba [11] used the same confounding function, but with a log link. We use the outcome nuisance model

(26) logit { E ( Y X , Z ) } = β 0 + β 1 X + β 2 T Z + β 3 T X Z ,

which includes both main effects of X and each element of Z , and interaction terms between these.

To appreciate what values of α can be considered plausible, Chiba [11] computed the crude log risk ratio between smoking and CHD, which was found to be equal to 0.58. He then considered values of α in the range ( 0.58 , 0.58 ) as plausible, thus presumably thinking of the unmeasured confounders as being about “equally strong” as smoking. However, while the sensitivity parameter α depends on the smoking-CHD risk (or odds) ratio, it does this in a complex fashion. Thus, there is no reason why the parameters should be similar, even if smoking was sufficient for confounding control. In particular, α also depends on the association between smoking and behavior pattern, and equals 0 if these are independent, even if smoking and CHD are strongly associated.

Instead, using our proposed method to calibrate α , with Z being smoking, gives a logically consistent range for α . Since both behavior pattern and smoking are binary, we use saturated logistic models for E ( Y X , Z ) and E ( Y X ) in the first step of the estimation, obtaining α ˆ = 0.03 . This indicates that, if we believe that the confounding potential of the unmeasured confounders, given the measured confounders, is no larger than the marginal confounding potential of smoking, then we may consider values of α in the range ( 0.03 , 0.03 ) , which is an order of magnitude narrower than the range ( 0.58 , 0.58 ) considered by Chiba [11]. As an informal model verification, we fit the saturated model d Z ( x , x , ; α ) = α 1 I ( x = 1 , x = 0 ) α 0 I ( x = 0 , x = 1 ) for the extended confounding function, obtaining α ˆ 1 = 0.028 and α ˆ 0 = 0.051 . Neither of these values are very far off from the estimate α ˆ = 0.03 obtained from the nonsaturated model d Z ( x , x , ; α ) = α ( x x ) , which gives some credibility to this model.

We next repeat the calibration, now letting Z be the set of all measured confounders, Z = Z . For this, we use the logistic model logit { E ( Y X ) } = δ 0 + δ 1 X and the logistic model for E ( Y X , Z ) = E ( Y X , Z ) in (26). The resulting estimate of α is 0.19, which indicates that, if we believe that the confounding potential of the unmeasured confounders, given the measured confounders, is no larger than the marginal confounding potential of the whole set of measured confounders, then we may consider values of α in the range ( 0.19 , 0.19 ) .

The left panel of Figure 1 shows the estimate of ψ 1 (solid line) together with pointwise 95% confidence limits (dashed lines), as a function of α in the range ( 0.19 , 0.19 ) . At α = 0 (no deviation from conditional exchangeability, given the measured confounders), ψ ˆ 1 = 0.67 with 95% confidence interval (0.39, 0.94). As α goes from 0.19 to 0.19, ψ ˆ 1 goes almost linearly from 0.85 to 0.48. Everywhere in this range, the lower confidence limit for ψ 1 is above 0. Thus, if we do not believe that the unmeasured confounders have a stronger potential than the measured confounders, then we may declare the behavior pattern to have a causal effect on CHD. The dotted line shows the estimated marginal log risk ratio, as obtained with the method given by Chiba [11]. In the range ( 0.19 , 0.19) for α , this is very similar to the marginal log odds ratio.

Figure 1 
                  The estimated marginal causal log odds ratio (solid lines) and log risk ratio (dotted lines) for behavior pattern and CHD together with pointwise 95% confidence limits (dashed lines) for the marginal causal log odds ratio, as functions of the sensitivity parameter 
                        
                           
                           
                              α
                           
                           \alpha 
                        
                     .
Figure 1

The estimated marginal causal log odds ratio (solid lines) and log risk ratio (dotted lines) for behavior pattern and CHD together with pointwise 95% confidence limits (dashed lines) for the marginal causal log odds ratio, as functions of the sensitivity parameter α .

The right panel of Figure 1 shows the same quantities as the left panel, but for a much wider range of α . At α = 0.70 and α = 0.41 , the estimate of ψ 1 and its lower confidence limit, respectively, reach 0. Thus, taking statistical uncertainty into account, the unmeasured confounders would have more than twice as strong confounding potential as the measured confounders, on the scale defined by α , to fully explain away the observed association between the behavior pattern and CHD. This type of reasoning, where one derives a condition under which the causal effect is nullified and then evaluates the plausibility of that condition, is analogous to the reasoning that underlies the popular “ E -value” [7], but expressed in terms of our sensitivity parameter. As α becomes large in magnitude, the estimated causal log odds ratio deviates dramatically from the estimated causal log risk ratio. While the former seems to converge to 3.87 and 2.91 as α goes to and , respectively, the latter appears to approach and almost linearly. However, the causal log risk ratio and odds ratio cannot be arbitrarily large; [31] showed that the observed data distribution implies restrictions on causal effects, the so-called bounds. For the WCGS data, these bounds are given by ( 2.24 , 3.09) and ( 2.93 , 3.87) for the causal log risk ratio and odds ratio, respectively, ignoring sampling variability. The sensitivity analysis with a logit link respects these bounds, whereas the sensitivity analysis with a log link does not. The reason for this is that the confounding function with a log link is not variation independent of the observed data distribution, as discussed in Section 4.

One may argue that the simple confounding function c ( x , x , Z ; α ) = α ( x x ) is not entirely realistic in the current example. For instance, CHD is a largely age-dependent disease, which is quite rare at young ages. Thus, one may suspect that the factors and mechanisms that cause CHD are very different for young people than for old people, in which case the potential for unmeasured confounding bias may be different for young and old people as well. To reflect this, we consider the confounding function c ( x , x , Z ; α ) = α 0 ( x x ) + α 1 ( x x ) ( age k ) , where the magnitude of α 1 determines the degree of such an interaction mechanism between the unmeasured confounders and the age. To gain intuition for this confounding function, consider two ages, age 1 and age 2 , such that age 1 > k > age 2 and Δ = age 1 k = age 2 k , and define r = α 1 / α 0 . The ratio of confounding functions at age 1 and age 2 is then given by

R = α 0 ( x x ) + α 1 ( x x ) ( age 1 k ) α 0 ( x x ) + α 1 ( x x ) ( age 2 k ) = 1 + Δ r 1 Δ r .

It is easy to show that R is increasing in Δ r , and that R > 0 if 1 < Δ r < 1 . The age range in the WCGS data is 39 to 59 years. We set k = 49 and consider values of r in the range ( 1 / 30 , 1 / 30 ) . It follows that 1 / 30 < Δ r < 1 / 30 , which implies that 0.5 < R < 2 . Thus, for this confounding function and choice of parameter values, the deviation from exchangeability may be up to twice as large at any age in the interval (39, 59), as compared to any other age in the interval.

The contour plots in Figure 2 show the estimate of ψ 1 (left panel) and the lower limit of its 95% confidence interval (right panel) as functions of α 0 in the range ( 0.19 , 0.19) and α 1 in the range ( α 0 / 30 , α 0 / 30 ) . We observe that the ranges of ψ ˆ 1 and its lower confidence limit in Figure 2 (0.47, 0.87) and (0.19, 0.59), respectively, are approximately the same as the corresponding ranges in Figure 1. Thus, the inclusion of the interaction term α 1 ( x x ) ( age k ) in the confounding function does not give qualitatively different conclusions for this example.

Figure 2 
                  The estimated marginal causal log odds ratio for behavior pattern and CHD (left panel) and its lower pointwise 95% confidence limit (right panel), as functions of the sensitivity parameters 
                        
                           
                           
                              
                                 
                                    α
                                 
                                 
                                    0
                                 
                              
                           
                           {\alpha }_{0}
                        
                      and 
                        
                           
                           
                              
                                 
                                    α
                                 
                                 
                                    1
                                 
                              
                           
                           {\alpha }_{1}
                        
                     . The gray areas correspond to values of 
                        
                           
                           
                              
                                 
                                    α
                                 
                                 
                                    1
                                 
                              
                           
                           {\alpha }_{1}
                        
                      outside the range 
                        
                           
                           
                              
                                 (
                                 
                                    −
                                    ∣
                                    
                                       
                                          α
                                       
                                       
                                          0
                                       
                                    
                                    ∣
                                    
                                    /
                                    30
                                    ,
                                    ∣
                                    
                                       
                                          α
                                       
                                       
                                          0
                                       
                                    
                                    ∣
                                    /
                                    
                                    30
                                 
                                 )
                              
                           
                           \left(-| {\alpha }_{0}| \hspace{0.1em}\text{/}30,| {\alpha }_{0}| \text{/}\hspace{0.1em}30)
                        
                     .
Figure 2

The estimated marginal causal log odds ratio for behavior pattern and CHD (left panel) and its lower pointwise 95% confidence limit (right panel), as functions of the sensitivity parameters α 0 and α 1 . The gray areas correspond to values of α 1 outside the range ( α 0 / 30 , α 0 / 30 ) .

8.2 Analysis with a continuous exposure

To illustrate our sensitivity analysis with a continuous exposure, we next consider systolic blood pressure (mm Hg) as the exposure and the other measured covariates as potential confounders Z . Again, we use the confounding function c ( x , x , Z ; α ) = α ( x x ) and the outcome nuisance model (26) to estimate ψ 1 in the marginal causal model (25). Since X is now continuous, the latter model is no longer saturated. We calibrated α with Z = Z , using logit { E ( Y X ) } = δ 0 + δ 1 X and the logistic model for E ( Y X , Z ) = E ( Y X , Z ) in (26) as mentioned earlier. The resulting estimate of α is 0.006.

The left panel of Figure 3 shows the estimate of ψ 1 (solid line) with pointwise 95% confidence limits (dashed lines), as functions of α in the range ( 0.006 , 0.006). At α = 0 (no deviation from conditional exchangeability, given the measured confounders), ψ ˆ 1 = 0.018 with 95% confidence interval (0.006, 0.031). At α = 0.006 , the lower confidence limit for ψ 1 is just below 0. Thus, if we believe that conditional confounding potential of the unmeasured confounders, given the measured confounders, may be as large as the marginal confounding potential of the measured confounders, then we cannot confidently declare systolic blood pressure to have a causal effect on CHD.

Figure 3 
                  The estimated marginal causal log odds ratio (solid lines) for systolic blood pressure and CHD together with pointwise 95% confidence limits (dashed liness), as functions of the sensitivity parameter 
                        
                           
                           
                              α
                           
                           \alpha 
                        
                     .
Figure 3

The estimated marginal causal log odds ratio (solid lines) for systolic blood pressure and CHD together with pointwise 95% confidence limits (dashed liness), as functions of the sensitivity parameter α .

The right panel of Figure 3 shows the estimate of ψ 1 (solid line) with pointwise 95% confidence limits (dashed lines) for a much wider range of α . We observe that ψ ˆ 1 seems confined to the range ( 0.1 , 0.15) as α goes from to . Thus, even if we do not make any assumptions about the strength of the unmeasured confounders, we may still be confident that ψ 1 lies in this range, provided that we have faith in the other model assumptions that we have made.

The causal model (25) assumes that the effect of systolic blood pressure on CHD increases linearly on the logistic scale. To relax this assumption, we add a second-order term to the marginal causal model and the outcome nuisance model, so that

logit { E ( Y ( x ) ) } = ψ 0 + ψ 1 x + ψ 2 x 2

and

logit { E ( Y X , Z ) } = β 0 + β 1 X + β 2 X 2 + β 3 Z ,

and use the confounding function c ( x , x , Z ; α ) = α ( x x ) as mentioned earlier. Figure 4 shows the estimate of the marginal causal log odds ratio logit [ p { Y ( x ) = 1 ; ψ } ] logit [ p { Y ( 106 ) = 1 ; ψ ) ] (left panel) and its lower 95% confidence limit (right panel) for selected values of α in the range ( 0.006 , 0.006), as functions of blood pressure x in the range (106, 166), which corresponds to the interval between the 0.025 and 0.975 quantiles in the sample distribution of systolic blood pressure. We observe that the log odds ratio does indeed look nonlinear, with a less steep increase in the causal effect for larger blood pressures.

Figure 4 
                  The estimated marginal causal log odds ratio (left panel) for systolic blood pressure and CHD and its lower pointwise 95% confidence limit (right panel), as functions of systolic blood pressure 
                        
                           
                           
                              x
                           
                           x
                        
                     , for selected values of the sensitivity parameter 
                        
                           
                           
                              α
                           
                           \alpha 
                        
                     . The log odds ratio uses systolic blood pressure 106 mmHg as reference level, and is estimated under a quadratic causal model.
Figure 4

The estimated marginal causal log odds ratio (left panel) for systolic blood pressure and CHD and its lower pointwise 95% confidence limit (right panel), as functions of systolic blood pressure x , for selected values of the sensitivity parameter α . The log odds ratio uses systolic blood pressure 106 mmHg as reference level, and is estimated under a quadratic causal model.

9 Discussion

In this article, we have developed a sensitivity analysis for residual confounding. The proposed analysis is very general in that it can handle arbitrary types of exposures and outcomes, and a wide range of causal effect measures. We have shown how the analysis can easily be implemented with standard software, and we have demonstrated the analysis in an illustrative example with real data. To guide researchers on how to chose reasonable values of α , we have proposed a method of calibration against measured confounders, using “extended confounding functions.”

Our work generalizes previous sensitivity analyses by Brumback et al. [10] and Chiba [11]. Recently, we proposed a related sensitivity analysis [13], which uses a parsimonious set of sensitivity parameters defined marginally over the measured confounders, instead of conditionally as shown in the studies by Brumback et al. [10] and Chiba [11] and in the current proposal. This sensitivity analysis does not require any parametric model for the deviation from exchangeability (e.g., the confounding function), but in contrast to the current proposal, it is limited to settings with low-level categorical (e.g., binary) exposures and marginal causal effects, and cannot be carried out with standard software.

As outlined in Section 1, Franks et al. [14] developed another related sensitivity analysis, using a “tiliting function” instead of a confounding function. Scharfstein et al. [15] developed a semiparametrically efficient estimator of the causal target effect using this tilting function. Our sensitivity analysis has interesting similarities with that by Franks et al. [14], but also substantial differences. In Appendix F, we provide a detailed comparison, and we re-analyze the WCGS data with the sensitivity analysis by Franks et al. [14]. In short, the two sensitivity analyses are nonparametrically equivalent, i.e., they give the same results if no modeling assumptions are made. When using parametric models, we identify four advantages of our analysis; it can handle arbitrary (e.g., nonbinary) exposures, it requires weaker modeling assumptions, it allows for estimation of both conditional and marginal causal effects, and it can be carried out easily with standard software. However, the additional modeling by Franks et al. [14] allows for estimation of a wider range of causal effects (e.g., quantile effects) and presumably also results in higher statistical efficiency.

We have conjectured that most analysts would be content with the simple confounding function c ( x , x , Z ; α ) = α ( x x ) , but we recommend that the sensitivity to this choice is also assessed by comparing with more elaborate functions. In our illustrative example, a more elaborate confounding function was considered due to the known dependence of CHD on age. This motivated our selection of the functional form, which included an interaction term with age. In this particular example, we observed that the more elaborate confounding function did not qualitatively change the conclusions, thus supporting the use of the simpler analysis. However, in similar settings where a more elaborate confounding function is suggested by the data or expert knowledge, this may not be true. We suggest analysts use the most conservative results obtained from the comparison of justifiable confounding functions.

We recognize several important extensions for future research. First, as suggested by one of the reviewers, instead of specifying a fully parametric model for the confounding function, one may specify a threshold for it and derive the implied bound on the target causal effect. This reduces the risk of bias due to model misspecification and makes the analysis more in line with those listed in the bottom-right quadrant of Table 1. Second, although our estimation method outlined in Section 5 is consistent, it is somewhat ad hoc and potentially inefficient. Thus, it would be important to dereive the semiparametrically efficient estimator of the target causal effect, under the joint model defined by the specified confounding function and the target causal model. Third, we have focused on causal parameters that can be expressed in terms of generalized linear models. Although this model class covers a wide range of parameters, real studies often collect data with right censored time-to-event outcomes, which are usually analyzed with survival (e.g., Cox proportional hazards) models. Thus, it would be important to extend our sensitivity analysis to allow for survival models as well.


,

  1. Conflict of interest: Authors state no conflict of interest.

Appendix A Considerations of variation independence for different choices of link functions η

Let Y be binary and let Y ( ) be the entire potential outcome function { Y ( x ) x X } , where X is the set of all possible values X can take. To show that the confounding function in (8) with logit link η is variation independent of the observed data distribution, we must show that for any given function c ( x , x , Z ; α ) such that c ( x , x , Z ; α ) = 0 and any given proper distribution p ( Z , X , Y ) , we are able to construct a proper distribution p { Z , X , Y , Y ( ) } for which consistency (1) and positivity (2) hold, which marginalizes to p ( Z , X , Y ) and for which the true confounding function is equal to c ( x , x , Z ; α ) . We do this in the following steps:

  1. Set p ( Z , X , Y ) = p ( Z , X , Y ) .

  2. Given ( Z , X , Y ) , set Y ( X ) = Y .

  3. For x X , set p { Y ( x ) = 1 Z , X , Y , Y ( X ) } = p { Y ( x ) = 1 Z , X } .

  4. Set

    p { Y ( x ) = 1 Z , X } = η 1 ( η [ p { Y ( X ) = 1 Z , X } ] c ( X , x , Z ; α ) ) = η 1 [ η { p ( Y = 1 Z , X ) } c ( X , x , Z ; α ) ] .

    When η is the logit link, the right-hand side always takes values between 0 and 1, and is thus a proper probability.

Let Y X , x ( ) denote the function Y ( ) , except the points Y ( X ) and Y ( x ) . Steps 1 through 4 define p { Z , X , Y , Y ( X ) , Y ( x ) } for all X , x . We finally let p { Y X , x ( ) Z , X , Y , Y ( X ) , Y ( x ) } be an arbitrary proper distribution for all X , x . The distribution p { Z , X , Y , Y ( ) } marginalizes to p ( Z , X , Y ) by step 1 and to c ( X , x , Z ; α ) by step 4, and it obeys consistency by step 2.

Consider now the identity and log links. With binary Y , we have from (1) and (8) that

c ( x , x , Z ) = η { p ( Y = 1 X = x , Z ) } η [ p { Y ( x ) = 1 X = x , Z } ] .

By insisting on p { Y ( x ) = 1 X = x , Z } being between 0 and 1, we obtain the following restrictions on c ( x , x , Z ; α ) for the identity link and log link, respectively:

p ( Y = 1 Z , X = x ) 1 c ( x , x , Z ; α ) p ( Y = 1 Z , X = x )

and

log { p ( Y = 1 Z , X = x ) } c ( x , x , Z ; α ) .

Under the model assumption c ( x , x , Z ; α ) = α ( x x ) , we further have that c ( x , x , Z ; α ) = c ( x , x , Z ; α ) for all Z , so that these restrictions imply that

max Z [ max { p ( Y = 1 Z , X = x ) 1 , p ( Y = 1 Z , X = x ) } ] c ( x , x , Z ; α ) min Z [ min { p ( Y = 1 Z , X = x ) , 1 p ( Y = 1 Z , X = x ) } ]

and

max Z [ log { p ( Y = 1 Z , X = x ) } ] c ( x , x , Z ; α ) min Z [ log { p ( Y = 1 Z , X = x ) } ] .

B Unbiasedness of H i ( α , β , ψ )

We show that E { H i ( α , β , ψ ) } = 0 for the conditional causal model in (10); the proof is analogous for the marginal causal model in (11). We first have that

(A1) E ( Y i k , α , β V i k , Z i ) = E [ η 1 { β T h ( V i k , Z i ) c ( V i k , X i , Z i ; α ) } V i k , Z i ] = E ( η 1 [ η { E ( Y X = V i k , Z i ) } c ( V i k , X i , Z i ; α ) ] V i k , Z i ) = E { η 1 ( η [ E { Y ( V i k ) X = V i k , Z i } ] c ( V i k , X i , Z i ; α ) ) V i k , Z i } = E [ E { Y ( V i k ) X i , Z i } V i k , Z i ] = E { Y ( V i k ) Z i } ,

where the first equality follows from (13), the second from (12), the third from (1), the fourth from (8), and the fifth from the law of total probability and the fact that X i V i k Z i . We now have that

E { H i ( α , β , ψ ) Z i } = K E { H i k ( α , β , ψ ) Z i } = K E ( g ( V i k , Z i ) [ Y i k , α , β η 1 { ψ T g ( V i k , Z i ) } ] Z i ) = K E ( g ( V i k , Z i ) [ E ( Y i k , α , β V i k , Z i ) η 1 { ψ T g ( V i k , Z i ) } ] Z i ) = K E ( g ( V i k , Z i ) [ E { Y ( V i k ) Z i } η 1 { ψ T g ( V i k , Z i ) } ] Z i ) = 0 ,

where the first equality follows from H i 1 ( α , β , ψ ) , , H i K ( α , β , ψ ) being identically distributed, the second from (15), the third from the law of total probability, the fourth from (A1), and the fifth from (10). It follows that E { H i ( α , β , ψ ) } = E [ E { H i ( α , β , ψ ) Z i } ] = 0 .

C Results for the extended confounding function

Suppose that

(A2) W 2 X W 1 .

We then have that

η { E ( Y X = x , W 1 ) } η [ E { E ( Y X = x , W 1 , W 2 ) X = x , W 1 } ] = η { E ( Y X = x , W 1 ) } η [ E { E ( Y X = x , W 1 , W 2 ) X = x , W 1 } ] = η { E ( Y X = x , W 1 ) } η { E ( Y X = x , W 1 ) } = 0 ,

where the first equality follows from (A2) and the second follows from the law of total probability. Now, suppose that

(A3) W 2 Y ( X , W 1 ) .

We then have that

η { E ( Y X = x , W 1 ) } η [ E { E ( Y X = x , W 1 , W 2 ) X = x , W 1 } ] = η { E ( Y X = x , W 1 ) } η [ E { E ( Y X = x , W 1 ) X = x , W 1 } ] = η { E ( Y X = x , W 1 ) } η { E ( Y X = x , W 1 ) } = 0 ,

where the first equality follows from (A3) and the second from the fact that E ( Y X = x , W 1 ) is constant, given ( X = x , W 1 ) . Finally, suppose that

(A4) Y ( x ) X ( W 1 , W 2 ) .

We then have that

η { E ( Y X = x , W 1 ) } η [ E { E ( Y X = x , W 1 , W 2 ) X = x , W 1 } ] = η [ E ( Y ( x ) X = x , W 1 ) ] η ( E [ E { Y ( x ) X = x , W 1 , W 2 } X = x , W 1 ] ) = η [ E ( Y ( x ) X = x , W 1 ) ] η ( E [ E { Y ( x ) X = x , W 1 , W 2 } X = x , W 1 ] ) = η [ E ( Y ( x ) X = x , W 1 ) ] η [ E { Y ( x ) X = x , W 1 } ] = c ( x , x , W 1 ) ,

where the first equality follows from (1), the second from (A4), the third from the law of total probability, and the fourth from (8).

D Consistency of α ˆ

Let x 1 , , x p be the elements of X . Define μ 1 ( x ) = E { E ( Y X = x , Z ) } , μ 2 ( x ; α ) = E ( η 1 [ η { E ( Y X = x ) } d Z ( x , X , ; α ) ] ) , D ( x ; α ) = { μ 1 ( x ) μ 2 ( x ; α ) } 2 , D ( α ) = x X D ( x ; α ) and

θ = { ξ , δ , μ 1 ( x 1 ) , , μ 1 ( x p ) , μ 2 ( x 1 ; α ) , , μ 2 ( x p ; α ) , α } T .

Let S ξ i ( ξ ) and S δ i ( δ ) be the contributions from subject i to the estimating (e.g., maximum likelihood score) functions for ξ and δ , respectively, and define

S μ i ( μ ) = E ( Y X = x 1 , Z i ; ξ ) μ 1 ( x 1 ) E ( Y X = x p , Z i ; ξ ) μ 1 ( x p ) η 1 [ η { E ( Y X = x 1 ) } c ( x 1 , X i ; α ) ] μ 2 ( x 1 ; α ) η 1 [ η { E ( Y X = x p ) } c ( x p , X i ; α ) ] μ 2 ( x p ; α ) ,

and

S θ i ( θ ) = S ξ i ( ξ ) S δ i ( δ ) S μ i ( μ ) d D ( α ) d α .

The estimator θ ˆ defined by steps 1–5 in Section 6 solves the estimating equation system:

i = 1 n S θ i ( θ ) = 0 .

If the models E ( Y X , Z ; ξ ) and E ( Y X ; δ ) are correct, then E { S ξ i ( ξ ) } = E { S δ i ( δ ) } = E { S μ i ( μ ) } = 0 at the true value of ( ξ , δ , μ ) . If d Z ( x , X i , ; α ) is correctly specified, then μ 2 ( x ) = μ 1 ( x ) at the true value of α , so that D ( α ) = 0 . Since 0 is the minimal value of D ( α ) , it follows that d D ( α ) d α = 0 at the true value of α , so that E { S θ i ( θ ) } = 0 at the true value of θ . It now follows from the standard theory for M-estimators [29] that θ ˆ is consistent.

E Simulation details

From the definition of the extended confounding function in (18), we have that

d Z ( 1 , 0 , ) = logit { p ( Y = 1 X = 1 ) } logit [ E { p ( Y = 1 X = 1 , Z ) X = 0 } ] = logit [ E { p ( Y = 1 X = 1 , Z , U ) X = 1 } ] logit ( E [ E { p ( Y = 1 X = 1 , Z , U ) X = 1 , Z } X = 0 ] ) d Z ( 0 , 1 , ) = logit { p ( Y = 1 X = 0 ) } logit [ E { p ( Y = 1 X = 0 , Z ) X = 1 } ] = logit [ E { p ( Y = 1 X = 0 , Z , U ) X = 0 } ] logit ( E [ E { p ( Y = 1 X = 0 , Z , U ) X = 0 , Z } X = 1 ] ) d U ( 1 , 0 , 0 ) = logit { p ( Y = 1 X = 1 , Z = 0 ) } logit [ E { p ( Y = 1 X = 1 , Z = 0 , U ) X = 0 , Z = 0 } ] = logit [ E { p ( Y = 1 X = 1 , Z = 0 , U ) X = 1 , Z = 0 } ] logit [ E { p ( Y = 1 X = 1 , Z = 0 , U ) X = 0 , Z = 0 } ] d U ( 1 , 0 , 1 ) = logit { p ( Y = 1 X = 1 , Z = 1 ) } logit [ E { p ( Y = 1 X = 1 , Z = 1 , U ) X = 0 , Z = 1 } ] = logit [ E { p ( Y = 1 X = 1 , Z = 1 , U ) X = 1 , Z = 1 } ] logit [ E { p ( Y = 1 X = 1 , Z = 1 , U ) X = 0 , Z = 1 } ] d U ( 0 , 1 , 0 ) = logit { p ( Y = 1 X = 0 , Z = 0 ) } logit [ E { p ( Y = 1 X = 0 , Z = 0 , U ) X = 1 , Z = 0 } ] = logit [ E { p ( Y = 1 X = 0 , Z = 0 , U ) X = 0 , Z = 0 } ] logit [ E { p ( Y = 1 X = 0 , Z = 0 , U ) X = 1 , Z = 0 } ] d U ( 0 , 1 , 1 ) = logit { p ( Y = 1 X = 0 , Z = 1 ) } logit [ E { p ( Y = 1 X = 0 , Z = 1 , U ) X = 1 , Z = 1 } ] = logit [ E { p ( Y = 1 X = 0 , Z = 1 , U ) X = 0 , Z = 1 } ] logit [ E { p ( Y = 1 X = 0 , Z = 1 , U ) X = 1 , Z = 1 } ] .

To compute these expressions, we need p ( Y X , Z , U ) , p ( Z , U X ) , p ( U X , Z ) and p ( Z X ) . From the model in (22), we have that

p ( Y = 1 X = x , Z = z , U = u ) = expit ( τ 0 + τ 1 u + τ 2 z + ψ x ) ,

and, since p ( Z = z , U = u ) = 1 / 2 × 1 / 2 = 1 / 4 for ( z , u ) { 0 , 1 } ,

p ( Z = z , U = u X = x ) = p ( X = x Z = z , U = u ) p ( Z = z , U = u ) ( z , u ) { 0 , 1 } p ( X = x Z = z , U = u ) p ( Z = z , U = u ) = { expit ( ν 0 + ν 1 u + ν 2 z ) } x + { 1 expit ( ν 0 + ν 1 u + ν 2 z ) } 1 x ( z , u ) { 0 , 1 } { expit ( ν 0 + ν u + ν 2 z ) } x + { 1 expit ( ν 0 + ν 1 u + ν 2 z ) } 1 x .

From the latter, we obtain p ( U X , Z ) and p ( Z X ) as follows:

p ( U = u X = x , Z = z ) = p ( Z = z , U = u X = x ) u { 0 , 1 } p ( Z = z , U = u X = x )

and

p ( Z = z X = x ) = u { 0 , 1 } p ( Z = z , U = u X = x ) .

Imposing the constraints in (24) with ψ = 0 now gives a system of six equations with six unknowns ( ν 0 , ν 1 , ν 2 , τ 0 , τ 1 , τ 2 ) .

F Comparison with Franks et al. [14]

Focusing on a binary exposure X and marginal causal effects, Franks et al. [14] used the decomposition

(A5) p { Y ( x ) = y } = E [ p { Y ( x ) = y X = x , Z } X = x ] p ( X = x ) + E [ p { Y ( x ) = y X = 1 x , Z } X = 1 x ] p ( X = 1 x ) = E { p ( Y = y X = x , Z ) X = x } p ( X = x ) + E [ p { Y ( x ) = y X = 1 x , Z } X = 1 x ] p ( X = 1 x ) ,

where the first equality follows from the law of total probability, and the second from (1). They expressed the nonidentifiable distribution p { Y ( x ) = y X = 1 x , Z } as a “tilt” of the identifiable distribution p ( Y = y X = x , Z ) :

(A6) p { Y ( x ) = y X = 1 x , Z } = W 1 ( x , Z ) t ( x , y , Z ) p ( Y = y X = x , Z ) ,

with the tilting function

(A7) t ( x , y , Z ) = p { X = 1 x Y ( x ) = y , Z } p { X = x Y ( x ) = y , Z }

and the normalizing constant

W ( x , Z ) = E { t ( X , Y , Z ) X = x , Z } .

They further proposed to use parametric models t ( x , y , Z ; γ ) and p ( Y = y X = x , Z ; β ) . Although, the model p ( Y = y X = x , Z ; β ) can be directly fit to the observed data, the tilting function t ( x , y , Z ) is nonidentifiable. They proposed to use the parameter (vector) γ indexing the model for the tilting function as a sensitivity parameter, and to specify the value of this parameter with external information (e.g., subject matter knowledge). Noting that the models t ( x , y , Z ; γ ) and p ( Y = y X = x , Z ; β ) translate into a model p { Y ( x ) = y X = 1 x , Z ; β , γ } through (A6), Franks et al. [14] used the relation in (A5) to estimate the counterfactual distribution p { Y ( x ) = y } for each fixed value γ as follows:

(A8) p ˆ { Y ( x ) = 1 } = n x 1 i : X i = x p ( Y = 1 X = x , Z i ; β ˆ ) p ˆ ( X = x ) + n 1 x 1 i : X i = 1 x p { Y ( x ) = 1 X = 1 x , Z i ; β ˆ , γ ) p ˆ ( X = 1 x ) ,

where n x and n 1 x are the number of subjects in the sample with X = x and X = 1 x , respectively, and p ˆ ( X = x ) and p ˆ ( X = 1 x ) are the corresponding sample proportions.

It is interesting to compare our sensitivity analysis to that by Franks et al. [14] when both X and Y are binary. Suppose that we, in this special case, use models

(A9) t ( x , y , Z ; γ ) = exp { a x ( Z ) + γ x y } ,

where { a 0 ( Z ) , a 1 ( Z ) } are arbitrary functions of Z , and

(A10) c ( x , x ; α ) = α 1 I ( x = 1 , x = 0 ) + α 0 I ( x = 0 , x = 1 )

with a logistic link function η , for the tilting function and confounding function, respectively. We note that the simple model α ( x x ) is a special case of (A10) with α = α 1 = α 0 . We then have that

γ x = log p { X = 1 x Y ( x ) = 1 , Z } p { X = x Y ( x ) = 1 , Z } p { X = 1 x Y ( x ) = 0 , Z } p { X = x Y ( x ) = 0 , Z } = log p { Y ( x ) = 1 X = 1 x , Z } p { Y ( x ) = 0 X = 1 x , Z } p { Y ( x ) = 1 X = x , Z } p { Y ( x ) = 0 X = x , Z } = α x ,

where the first equality follows from the definition of the tilting function in (A7) and model (A9), the second from Bayes’ rule and the third from the definition of the confounding function in (8) and model (A10). Thus, in this special case, the sensitivity parameter α x is identical to the sensitivity parameter γ x , up to the sign. If Z is low dimensional (e.g., binary) or absent, then both our sensitivity analysis and that by Franks et al. [14] can be carried out without any assumptions on the observed data distributions, in which case the analyses will also give identical results.

In more complex situations, this equivalence does not generally hold. In particular, the analyses rely on different models for quantifying the degree of deviation from exchangeability, and a fully parametric model for the tilting function does not generally imply a fully parametric model for the confounding function or vice versa. Thus, neither analysis is necessarily more susceptible to model assumptions and misspecification than the other.

Our sensitivity analysis has certain advantages over that by Franks et al. [14]. First, our sensitivity analysis allows for arbitrary exposures, Franks et al. [14] is restricted to binary exposures. Second, our sensitivity analysis only requires a model for the mean E ( Y X , Z ) , and Franks et al. [14] requires a fully parametric model for the entire outcome distribution p ( Y = y X = x , Z ) . However, we concede that this additional modeling allows for estimation of the entire counterfactual outcome distribution, and thus also any causal effect that can be expressed as a function thereof, e.g., quantile differences. Our sensitivity analysis only allows for estimation of (potentially transformed) mean differences.

Third, our sensitivity analysis allows for estimation of both conditional (given Z ) and marginal (over Z ) causal effects, Franks et al. [14] focused on the latter. By doing so, they avoided modeling of the distribution p ( X , Z ) , since the conditional expectations and probabilities p ( X = x ) and p ( X = 1 x ) in (A5) can be estimated with their sample counterparts, as in (A8). If one would instead want to estimate conditional causal effects, then the relevant decomposition would be

(A11) p { Y ( x ) = y Z } = p { Y ( x ) = y X = x , Z } p ( X = x Z ) + p { Y ( x ) = y X = 1 x , Z } p ( X = 1 x Z ) = p ( Y = y X = x , Z ) p ( X = x Z ) + p { Y ( x ) = y X = 1 x , Z } p ( X = 1 x Z ) .

To use this decomposition for estimation of p { Y ( x ) = y Z } , a model for p ( X Z ) is required, unless Z is low dimensional (e.g., binary) in which case p ( X Z ) can be estimated nonparametrically.

Fourth, our sensitivity analysis can be carried out easily with standard GLM software, Franks et al. [14] is generally more difficult to implement and computationally intensive. This is because the tilting function is not variation independent of the observed data distribution; the parametric model for the tilting function and the value of γ must be chosen to satisfy certain integral constraints (equation (4) in Franks et al. [14]). Also, for any given model for the tilting function and value of γ , the value of the normalizing constant W ( x , Z ) must be generally computed numerically for each value of ( x , Z ) . Franks et al. [14] showed that these obstacles can be mitigated by assuming an exponential family model for the outcome distribution p ( Y = y X = x , Z ) , and a parametric model of the form

(A12) t ( x , y , Z ; γ ) = exp { a x ( Z ) + γ x s x ( y ) }

for the tilting function, where s x ( y ) is a pre-specified (possibly vector valued) function of y , e.g., s x ( y ) = y as in (A9). The arbitrary functions { a 0 ( Z ) , a 1 ( Z ) } are required to complete the model, but cancel out from the analysis since they appear in both numerator and denominator on the right-hand side of (A6). For these models, γ = ( γ 0 , γ 1 ) can be chosen independently of the observed data distribution, and the normalizing constant W ( x , Z ) has a simple analytic expression. However, we note that model (A12) assumes that the odds ratio t ( x , 1 , Z ) / t ( x , 0 , Z ) = γ x is constant across levels of Z , which may be unrealistic in some applications. This is analogous to assuming that the confounding function c ( x , x , Z ) does not depend on Z , as when using the simple model c ( x , x , Z ) = α ( x x ) . However, in our sensitivity analysis, this assumption can be easily relaxed, as we demonstrate in the real data analysis in Section 8.

The greater flexibility and computational efficiency of our sensitivity analysis come at a cost. The estimator by Franks et al. [14] in (A8) is a (semi-parametric, since it does not use a model for p ( Z , X ) ) maximum likelihood estimator, our proposed estimator in Section 5 is somewhat ad hoc and is not derived by considering statistical efficiency. Thus, we expect that the estimator in (A8) generally has a higher statistical efficiency than our estimator.

As a practical comparison, we re-analyze the WCGS data with the sensitivity analysis by Franks et al. [14]. Since their sensitivity analysis is restricted to binary exposures, we only consider behavior pattern as the exposure in this analysis. We use the same marginal causal logistic model (25) and outcome nuisance model (26) as in the main analysis in Section 8. To mimic our choice of confounding function in the main analysis, we use model (A9) with γ 1 = γ 0 = γ . It follows from Proposition 1 in Franks et al. [14] that model (A9) for the tilting function, together with the outcome nuisance model (26), imply the logistic model

logit [ p { Y ( x ) = 1 X = 1 x , Z ; β , γ } ] = logit { p ( Y = 1 X = x , Z ; β ) } + γ ( 2 x 1 )

for the nonidentifiable distribution p { Y ( x ) = 1 X = 1 x , Z } . For estimated regression parameters β ˆ = ( β ˆ 0 , β ˆ 1 , β ˆ 2 , β ˆ 3 ) and fixed sensitivity parameter γ , we use relation (A8) to estimate the marginal counterfactual probability p { Y ( x ) = 1 } for x { 0 , 1 } .

Franks et al. [14] developed a calibration method for γ based on partial correlations. We view the choice between extended confounding functions and partial correlations as a matter of taste, since these are not end-products of the analyses, just intermediate quantities to enable a comparison between unmeasured and measured confounders. However, we note that partial correlations may be somewhat inconvenient for nonlinear models. In particular, although the sensitivity analysis by Franks et al. [14] allows for nonlinear models, their calibration does not, since it implicitly assumes collapsibility of effect measures (Franks, personal communication), which does not generally hold in nonlinear models [32]. To solve this problem we note that the simple model α ( x x ) for the confounding function that we used in the main analysis is a special case of model (A10) with α = α 1 = α 0 . It thus follows from the aforementioned arguments that our sensitivity parameter α is identical to the negative of the sensitivity parameter γ by Franks et al. [14]. We thus consider the same range for γ as we did for α in the main analysis.

Figure A1 shows the results. Comparing this with Figure 1 shows that the point estimate of ψ 1 obtained from the sensitivity analysis by Franks et al. [14], as a function of γ , is similar to the point estimate obtained from our sensitivity analysis, as a function of α . However, the pointwise confidence intervals are substantially narrower in the analysis by Franks et al. [14], as expected from the discussion of computational and statistical efficiency discussed earlier.

Figure A1 
                  The estimated marginal causal log odds ratio (solid lines) for behavior pattern and CHD together with pointwise 95% confidence limits (dashed lines) for the marginal causal log odds ratio, as functions of the negative of the sensitivity parameter 
                        
                           
                           
                              γ
                           
                           \gamma 
                        
                     .
Figure A1

The estimated marginal causal log odds ratio (solid lines) for behavior pattern and CHD together with pointwise 95% confidence limits (dashed lines) for the marginal causal log odds ratio, as functions of the negative of the sensitivity parameter γ .

G R-code

#—UTILITY FUNCTIONS—

library(data.table)

aggr <- function(x, clusters) {
colnames(x) <- NULL
temp <- data.table(x)
temp <- as.matrix(temp[, j=lapply(.SD, sum), by=clusters])[, -1] }
boundsfun <- function(data, X, Y) {
pX1 <- mean(data[, X]==1)
pX0 <- 1-pX1
pY1.X1 <- mean(data[, Y]==1 & data[, X]==1)/pX1
pY1.X0 <- mean(data[, Y]==1 & data[, X]==0)/pX0
l0 <- pY1.X0*pX0
u0 <- l0+pX1
l1 <- pY1.X1*pX1
u1 <- l1+pX0
b <- c(l0, u0, l1, u1)
names(b) <- c("l0", "u0", "l1", "u1")
b
}
cfit <- function(cfun, nalpha, fitY.X, fitY.XZ, X) {
data <- fitY.X$data
n <- nrow(data)
etainvfun <- family(fitY.X)$linkinv
uniqueX <- sort(unique(data[, X]))
nX <- length(uniqueX) dataX <- data.frame(uniqueX)
colnames(dataX) <- X
pred <- predict(object=fitY.X, newdata=dataX)
fn <- function(par) {
mu1 <- vector(length=nX)
mu2 <- vector(length=nX)
for(i in 1:nX) {
x <- uniqueX[i]
newdata <- data
newdata[, X] <- x
mu1[i] <- mean(predict(object=fitY.XZ, newdata=newdata, type="response"))
mu2[i] <- mean(etainvfun(pred[i]-cfun(x=x, xprim=data[, X], alpha=par))) }
sum((mu1-mu2)^2)
}
if(nalpha==1)
alpha <-
optim(par=rep(0, nalpha), fn=fn, method="Brent", lower=-1, upper=1)$par
else
alpha <- optim(par=rep(0, nalpha), fn=fn)$par
}
Eval <- function(formula, fit, X) {
f <- function(alpha) {
cfun <- function(data) alpha*(data$x-data$xprim)
out <- glm.sens(formula=formula, fit=fit, X=X, cfun=cfun)
est <- out$coef[X]
est
}
E.est <- uniroot(f=f, lower=-5, upper=5)$root
f <- function(alpha) {
cfun <- function(data) alpha*(data$x-data$xprim)
out <- glm.sens(formula=formula, fit=fit, X=X, cfun=cfun)
est <- glm.sens(formula=formula, fit=fit, X=X, cfun=cfun)$coef[X]
se <- sqrt(out$vcov[X, X])
est-1.96*se
}
E.ci <- uniroot(f=f, lower=-5, upper=5)$root
c(E.est, E.ci)
}
expand <- function(data, X) {
n <- nrow(data)
newdata <- NULL
if(is.factor(data[, X]) ∣ is.binary(data[, X]))
Vlev <- unique(data[, X])
if(is.numeric(data[, X]) & !is.binary(data[, X]))
Vlev <- quantile(data[, X], probs=seq(0.25, 0.75, 0.25))
nVlev <- length(Vlev)
for(i in 1:nVlev) {
tmp <- data
tmp$id <- 1:n
tmp$x <- Vlev[i]
newdata <- rbind(newdata, tmp)
}
newdata <- newdata[order(newdata$id), ]
newdata$xprim <- newdata[, X]
newdata
}
glm.sens <- function(formula, fit, X, cfun) {
call <- match.call()
data <- fit$data
n <- nrow(data)
nbeta <- length(fit$coef)
Y <- as.character(formula[[2]])
family <- family(fit)$family
etafun <- family(fit)$linkfun
etainvfun <- family(fit)$linkinv
dmu.detafun <- family(fit)$mu.eta
newdata <- expand(data, X)
newdata$c <- cfun(newdata)
newdata[, X] <- newdata$x
neweta <- predict(object=fit, newdata=newdata)-newdata$c
newdata[, Y] <- etainvfun(neweta)
fit.psi <- glm(formula=formula, family=family, data=newdata)
coef <- fit.psi$coef
npsi <- length(coef)
design.beta <- model.matrix(fit)
res.beta <- residuals(fit, type="response")
S <- design.beta*res.beta
dS.dbeta <- -solve(summary(fit)$cov.unscaled)/n
dS.dpsi <- matrix(0, nrow=nbeta, ncol=npsi)
design.psi <- model.matrix(fit.psi)
res.psi <- residuals(fit.psi, type="response")
H <- aggr(x=design.psi*res.psi, clusters=newdata$id)
dH.dpsi <- -solve(summary(fit.psi)$cov.unscaled)/n
dnewmu.dneweta <- dmu.detafun(neweta)
dneweta.dbeta <- model.matrix(object=fit, data=newdata)
dnewmu.dbeta <- dnewmu.dneweta*dneweta.dbeta
dH.dbeta <- t(design.psi)
U <- cbind(H, S)
I <- rbind(cbind(dH.dpsi, dH.dbeta), cbind(dS.dpsi, dS.dbeta))
J <- var(U)
vcov <- (solve(I)
out <- list(call=call, coefficients=coef, vcov=vcov, fit.psi=fit.psi)
class(out) <- "glm.sens"
return(out)
}
is.binary <- function(x)
is.numeric(x) & all(x==0 ∣ x==1)
logit <- function(x) log(x/(1-x))
print.glm.sens <- function (x, digits=max(3L, getOption("digits")-3L), … ) {
if(length(x$coef)) {
cat(" \ nCoefficients: \ n")
print.default(format(x$coef, digits=digits), print.gap=2, quote=FALSE)
cat(" \ n")
}
else {
cat("No coefficients \ n \ n")
}
}
print.summary.glm.sens <- function(x, digits=max(3L, getOption("digits")-3L),
signif.stars=getOption("show.signif.stars"), …) {
cat(" \ nCall: ", " \ n", paste(deparse(x$call), sep=" \ n", collapse=" \ n"), " \ n",
sep="")
cat(" \ nCoefficients:", " \ n")
printCoefmat(x$coefficients, digits=digits, signif.stars=signif.stars,
na.print = "NA", …)
cat(" n")
}
summary.glm.sens <- function(object, …) {
s.err <- sqrt(diag(as.matrix(object$vcov)))
zvalue <- object$coef/s.err
pvalue <- 2*pnorm(-abs(zvalue))
coef.table <- as.matrix(cbind(object$coef, s.err, zvalue, pvalue))
dimnames(coef.table) <- list(names(object$coef), c("Estimate", "Std. Error",
"z value", "Pr(>∣z∣)"))
ans <- list(call=object$call, coefficients=coef.table)
class(ans) <- "summary.glm.sens"
return(ans)
}
if(0) {
#—SIMULATION—
set.seed(1)
#function for generating data, given regression parameters
datafun <- function(n, par) {
b0 <- par[1]
b1 <- par[2]
b2 <- par[3]
a0 <- par[4]
a1 <- par[5]
a2 <- par[6]
pU1 <- par[7]
pZ1 <- par[8]
U <- rbinom(n, 1, pU1)
Z <- rbinom(n, 1, pZ1)
X <- rbinom(n, 1, plogis(a0+a1*U+a2*Z))
Y <- rbinom(n, 1, plogis(b0+b1*U+b2*Z))
data.frame(U, Z, X, Y)
}
#function for computing probabilities, given regression parameters
probfun <- function(par) {
b0 <- par[1]
b1 <- par[2]
b2 <- par[3]
a0 <- par[4]
a1 <- par[5]
a2 <- par[6]
pU1 <- par[7]
pZ1 <- par[8]
pU0 <- 1-pU1
pZ0 <- 1-pZ1
pX1.U0Z0 <- plogis(a0)
pX1.U0Z1 <- plogis(a0+a2)
pX1.U1Z0 <- plogis(a0+a1)
pX1.U1Z1 <- plogis(a0+a1+a2)
pX0.U0Z0 <- 1-pX1.U0Z0
pX0.U0Z1 <- 1-pX1.U0Z1
pX0.U1Z0 <- 1-pX1.U1Z0
pX0.U1Z1 <- 1-pX1.U1Z1
pY1.U0Z0 <- plogis(b0)
pY1.U0Z1 <- plogis(b0+b2)
pY1.U1Z0 <- plogis(b0+b1)
pY1.U1Z1 <- plogis(b0+b1+b2)
pY0.U0Z0 <- 1-pY1.U0Z0
pY0.U0Z1 <- 1-pY1.U0Z1
pY0.U1Z0 <- 1-pY1.U1Z0
pY0.U1Z1 <- 1-pY1.U1Z1
pX0U0Z0 <- pX0.U0Z0*pU0*pZ0
pX0U0Z1 <- pX0.U0Z1*pU0*pZ1
pX0U1Z0 <- pX0.U1Z0*pU1*pZ0
pX0U1Z1 <- pX0.U1Z1*pU1*pZ1
pX1U0Z0 <- pX1.U0Z0*pU0*pZ0
pX1U0Z1 <- pX1.U0Z1*pU0*pZ1
pX1U1Z0 <- pX1.U1Z0*pU1*pZ0
pX1U1Z1 <- pX1.U1Z1*pU1*pZ1
pX1 <- pX1U0Z0+pX1U0Z1+pX1U1Z0+pX1U1Z1
pX0 <- 1-pX1
pU0Z0.X0 <- pX0U0Z0/pX0
pU0Z1.X0 <- pX0U0Z1/pX0
pU1Z0.X0 <- pX0U1Z0/pX0
pU1Z1.X0 <- pX0U1Z1/pX0
pU0Z0.X1 <- pX1U0Z0/pX1
pU0Z1.X1 <- pX1U0Z1/pX1
pU1Z0.X1 <- pX1U1Z0/pX1
pU1Z1.X1 <- pX1U1Z1/pX1
pX0Z0 <- pX0U0Z0+pX0U1Z0
pX0Z1 <- pX0U0Z1+pX0U1Z1
pX1Z0 <- pX1U0Z0+pX1U1Z0
pX1Z1 <- pX1U0Z1+pX1U1Z1
pU1.X0Z0 <- pX0U1Z0/pX0Z0
pU1.X0Z1 <- pX0U1Z1/pX0Z1
pU1.X1Z0 <- pX1U1Z0/pX1Z0
pU1.X1Z1 <- pX1U1Z1/pX1Z1
pU0.X0Z0 <- 1-pU1.X0Z0
pU0.X0Z1 <- 1-pU1.X0Z1
pU0.X1Z0 <- 1-pU1.X1Z0
pU0.X1Z1 <- 1-pU1.X1Z1
pY1.X0 <- pY1.U0Z0*pU0Z0.X0+pY1.U0Z1*pU0Z1.X0+
pY1.U1Z0*pU1Z0.X0+pY1.U1Z1*pU1Z1.X0
pY1.X1 <- pY1.U0Z0*pU0Z0.X1+pY1.U0Z1*pU0Z1.X1+
pY1.U1Z0*pU1Z0.X1+pY1.U1Z1*pU1Z1.X1
pY1.X0Z0 <- pY1.U0Z0*pU0.X0Z0+pY1.U1Z0*pU1.X0Z0
pY1.X0Z1 <- pY1.U0Z1*pU0.X0Z1+pY1.U1Z1*pU1.X0Z1
pY1.X1Z0 <- pY1.U0Z0*pU0.X1Z0+pY1.U1Z0*pU1.X1Z0
pY1.X1Z1 <- pY1.U0Z1*pU0.X1Z1+pY1.U1Z1*pU1.X1Z1
pZ1.X0 <- pX0Z1/pX0
pZ1.X1 <- pX1Z1/pX1
pZ0.X0 <- 1-pZ1.X0
pZ0.X1 <- 1-pZ1.X1
pY1.UZ <- c(pY1.U0Z0, pY1.U0Z1, pY1.U1Z0, pY1.U1Z1)
pX1.UZ <- c(pX1.U0Z0, pX1.U0Z1, pX1.U1Z0, pX1.U1Z1)
pXUZ <- c(pX0U0Z0, pX0U0Z1, pX0U1Z0, pX0U1Z1,
pX1U0Z0, pX1U0Z1, pX1U1Z0, pX1U1Z1)
pUZ.X0 <- c(pU0Z0.X0, pU0Z1.X0, pU1Z0.X0, pU1Z1.X0)
pUZ.X1 <- c(pU0Z0.X1, pU0Z1.X1, pU1Z0.X1, pU1Z1.X1)
pXZ <- c(pX0Z0, pX0Z1, pX1Z0, pX1Z1)
pU1.XZ <- c(pU1.X0Z0, pU1.X0Z1, pU1.X1Z0, pU1.X1Z1)
pY1.X <- c(pY1.X0, pY1.X1)
pY1.XZ <- c(pY1.X0Z0, pY1.X0Z1, pY1.X1Z0, pY1.X1Z1)
pZ1.X <- c(pZ1.X0, pZ1.X1)
list(pY1.UZ=pY1.UZ, pX1.UZ=pX1.UZ,
pX1=pX1,
pXUZ=pXUZ,
pUZ.X0=pUZ.X0, pUZ.X1=pUZ.X1,
pXZ=pXZ,
pU1.XZ=pU1.XZ,
pY1.X=pY1.X,
pY1.XZ=pY1.XZ,
pZ1.X=pZ1.X)
}
#function for computing d-functions, given regression parameters
dfun <- function(par) {
prob <- probfun(par)
pY1.X0 <- prob$pY1.X[1]
pY1.X1 <- prob$pY1.X[2]
pY1.X0Z0 <- prob$pY1.XZ[1]
pY1.X0Z1 <- prob$pY1.XZ[2]
pY1.X1Z0 <- prob$pY1.XZ[3]
pY1.X1Z1 <- prob$pY1.XZ[4]
pY1.U0Z0 <- prob$pY1.UZ[1]
pY1.U0Z1 <- prob$pY1.UZ[2]
pY1.U1Z0 <- prob$pY1.UZ[3]
pY1.U1Z1 <- prob$pY1.UZ[4]
pZ1.X0 <- prob$pZ1.X[1]
pZ1.X1 <- prob$pZ1.X[2]
pZ0.X0 <- 1-pZ1.X0
pZ0.X1 <- 1-pZ1.X1
pU1.X0Z0 <- prob$pU1.XZ[1]
pU1.X0Z1 <- prob$pU1.XZ[2]
pU1.X1Z0 <- prob$pU1.XZ[3]
pU1.X1Z1 <- prob$pU1.XZ[4]
pU0.X0Z0 <- 1-pU1.X0Z0
pU0.X0Z1 <- 1-pU1.X0Z1
pU0.X1Z0 <- 1-pU1.X1Z0
pU0.X1Z1 <- 1-pU1.X1Z1
dZ.10 <- logit(pY1.X1)-logit(pY1.X1Z0*pZ0.X0+pY1.X1Z1*pZ1.X0)
dZ.01 <- logit(pY1.X0)-logit(pY1.X0Z0*pZ0.X1+pY1.X0Z1*pZ1.X1)
dU.100 <- logit(pY1.X1Z0)-logit(pY1.U0Z0*pU0.X0Z0+pY1.U1Z0*pU1.X0Z0)
dU.101 <- logit(pY1.X1Z1)-logit(pY1.U0Z1*pU0.X0Z1+pY1.U1Z1*pU1.X0Z1)
dU.010 <- logit(pY1.X0Z0)-logit(pY1.U0Z0*pU0.X1Z0+pY1.U1Z0*pU1.X1Z0)
dU.011 <- logit(pY1.X0Z1)-logit(pY1.U0Z1*pU0.X1Z1+pY1.U1Z1*pU1.X1Z1)
dZ <- c(dZ.10, dZ.01)
dU <- c(dU.100, dU.101, dU.010, dU.011)
list(dZ=dZ, dU=dU)
}
#function for computing the function that parfun minimizes
optfun <- function(par, alpha, pU1, pZ1) {
par <- c(par, pU1, pZ1)
d <- dfun(par)
dZ.10 <- d$dZ[1]
dZ.01 <- d$dZ[2]
dU.100 <- d$dU[1]
dU.101 <- d$dU[2]
dU.010 <- d$dU[3]
dU.011 <- d$dU[4]
(dZ.10-alpha)^2+(dZ.01+alpha)^2+
(dU.100-alpha)^2+(dU.101-alpha)^2+(dU.010+alpha)^2+(dU.011+alpha)^2
}
#function for computing regression parameters, given alpha, pU1 and pZ1
parfun <- function(alpha, pU1, pZ1) {
parinit <- c(-1, 0.5, 0.5, -1, 0.5, 0.5)
optim(par=parinit, fn=optfun, alpha=alpha, pU1=pU1, pZ1=pZ1)$par
}
#function for estimating alpha and psi, given data
estfun <- function(data, alpha) {
fitY.XZ <- glm(formula=Y X+Z+X*Z, family="quasibinomial", data=data)
if(missing(alpha)) {
fitY.X <- glm(formula=Y X, family="quasibinomial", data=data)
cfun <- function(x, xprim, alpha)
alpha*(x-xprim)
alphaest <- cfit(cfun=cfun, nalpha=1, fitY.X=fitY.X, fitY.XZ=fitY.XZ, X="X")
alpha <- alphaest
}
cfun <- function(data)
alpha*(data$x-data$xprim)
out <- glm.sens(formula=Y X+Z, fit=fitY.XZ, X="X", cfun=cfun)
psi <- out$coef["X"]
se <- sqrt(out$vcov["X", "X"])
list(alpha=alpha, psi=psi, se=se)
}
#simulate
alpha.all <- c(0.05, 0.1, 0.25)
pU1 <- 0.5
pZ1 <- 0.5
N <- 300
n <- 1000
B <- 100
par.all <- matrix(nrow=length(alpha.all), ncol=6)
results.true <- matrix(nrow=length(alpha.all), ncol=4)
results.est <- matrix(nrow=length(alpha.all), ncol=5)
for(k in 1:length(alpha.all)) {
alpha <- alpha.all[k]
par <- parfun(alpha, pU1, pZ1)
par.all[k, ] <- par
par <- c(par, pU1, pZ1)
results.true.all <- matrix(nrow=N, ncol=3)
results.est.all <- matrix(nrow=N, ncol=4)
for(i in 1:N) {
data <- datafun(n, par)
#using true alpha
out.true <- estfun(data, alpha)
estpsiboot <- vector(length=B)
for(j in 1:B) {
dataB <- data[sample(1:n, n, replace=TRUE), ]
out.boot <- estfun(dataB, alpha)
estpsiboot[j] <- out.boot$psi
} results.true.all[i, ] <- c(out.true$psi, out.true$se, sd(estpsiboot))
sepsi.boot <- sd(estpsiboot)
#using estimated alpha
out.est <- estfun(data)
estpsiboot <- vector(length=B)
for(j in 1:B) {
dataB <- data[sample(1:n, n, replace=TRUE), ]
out.boot <- estfun(dataB)
estpsiboot[j] <- out.boot$psi
} results.est.all[i, ] <- c(out.est$alpha, out.est$psi, out.est$se,
sd(estpsiboot))
} results.true[k, ] <- c(colMeans(results.true.all), sd(results.true.all[, 1]))
results.est[k, ] <- c(colMeans(results.est.all), sd(results.est.all[, 2]))
}
print(round(par.all, 2))
print(round(results.true, 2))
print(round(results.est, 2))
}
#—ANALYSIS—
library(faraway) data(wcgs)
wcgs$behave <- as.numeric(wcgs$behave=="A1" ∣ wcgs$behave=="A2")
wcgs$height <- 0.0254*wcgs$height
wcgs$weight <- 0.45359237*wcgs$weight
wcgs$bmi <- wcgs$weight/wcgs$height^2
wcgs$smoke <- as.numeric(wcgs$cigs>0)
wcgs$chd <- as.numeric(wcgs$chd=="yes")
print("Table 1 from Chiba 2009")
print(xtabs( behave+chd, data=wcgs))
fitX <- glm(formula=behave (age+sdp+dbp+bmi+smoke)^2, family="quasibinomial",
data=wcgs)
fitY <- glm(formula=chd (behave+age+sdp+dbp+bmi+smoke)^2,
family="quasibinomial", data=wcgs)
print("—Analysis with X=behave—")
bounds <- boundsfun(data=wcgs, X="behave", Y="chd")
l0 <- bounds["l0"]
u0 <- bounds["u0"]
l1 <- bounds["l1"]
u1 <- bounds["u1"]
bounds.logRR <- log(c(l1/u0, u1/l0))
bounds.logOR <- log(c((l1/(1-l1))/(u0/(1-u0)), (u1/(1-u1))/(l0/(1-l0))))
print("bounds logRR")
print(bounds.logRR)
print("bounds logOR")
print(bounds.logOR)
cfun <- function(x, xprim, alpha) (x-xprim)*alpha
fitY.X <- glm(formula=chd behave, family="quasibinomial", data=wcgs)
fitY.XZ <- glm(formula=chd behave*smoke, family="quasibinomial", data=wcgs)
alpha <- cfit(cfun=cfun, nalpha=1, fitY.X=fitY.X, fitY.XZ=fitY.XZ, X="behave")
print("alpha* with Z*=smoke")
print(alpha)
alpha <- cfit(cfun=cfun, nalpha=1, fitY.X=fitY.X, fitY.XZ=fitY, X="behave")
print("alpha* with Z*=Z")
print(alpha)
cfun <- function(x, xprim, alpha) (x>xprim)*alpha[1]-(x<xprim)*alpha[2]
alpha <- cfit(cfun=cfun, nalpha=2, fitY.X=fitY.X, fitY.XZ=fitY.XZ, X="behave")
print("alpha* with Z*=smoke, estimated non-parametrically")
print(alpha)
p <- predict(object=fitX, type="response")
w <- 1/(wcgs$behave*p+(1-wcgs$behave)*(1-p))
alpha <- c(seq(-8, -1, 1), seq(-0.2, 0.2, 0.1), seq(1, 8, 1))
est <- vector(length=length(alpha))
est.chiba <- vector(length=length(alpha))
se <- vector(length=length(alpha))
for(i in 1:length(alpha)) {
cfun <- function(data) alpha[i]*(data$x-data$xprim)
out <- glm.sens(formula=chd behave, fit=fitY, X="behave", cfun=cfun)
est[i] <- out$coef["behave"]
se[i] <- sqrt(out$vcov["behave", "behave"])
correction <- wcgs$behave*(p+(1-p)/exp(alpha[i]))+
(1-wcgs$behave)*(exp(alpha[i])*p+(1-p))
Y <- w*correction*wcgs$chd
est.chiba[i] <- log(mean(Y[wcgs$behave==1])/mean(Y[wcgs$behave==0]))
}
ci <- cbind(est-1.96*se, est+1.96*se)
windows(width=14, height=7)
op <- par(mfrow=c(1,2), mar=c(5.1, 4.3, 4.1, 2.1))
incl <- which(alpha>=-0.201 & alpha<=0.201)
matplot(alpha[incl], cbind(est, ci, est.chiba)[incl, ], type="l", col=1,
lty=c(1, 2, 2, 3), xlab=expression(alpha), ylab=expression(hat(psi)[1]))
matplot(alpha, cbind(est, ci, est.chiba), type="l", col=1, lty=c(1, 2, 2, 3),
xlab=expression(alpha), ylab=expression(hat(psi)[1]))
par(op)
print("Estimate and 95
print(est[alpha==0])
print(ci[alpha==0, ])
E <- Eval(formula=chd behave, fit=fitY, X="behave")
print("E-value")
print(E)
alpha0 <- seq(-0.2, 0.2, 0.2/5)
alpha1 <- alpha0/30
k <- 49
est <- matrix(nrow=length(alpha0), ncol=length(alpha1))
rownames(est) <- alpha0
colnames(est) <- alpha1
se <- matrix(nrow=length(alpha0), ncol=length(alpha1))
for(i in 1:length(alpha0)) {
for(j in 1:length(alpha1)) {
cfun <- function(data)
alpha0[i]*(data$x-data$xprim)+alpha1[j]*(data$x-data$xprim)*(data$age-k)
out <- glm.sens(formula=chd behave, fit=fitY, X="behave", cfun=cfun)
if(abs(alpha1[j])<=abs(alpha0[i])/30) {
est[i, j] <- out$coef["behave"]
se[i, j] <- sqrt(out$vcov["behave", "behave"])
}
}
}
lower <- est-1.96*se
windows(width=14, height=7)
op <- par(mfrow=c(1,2))
contour(x=alpha0, y=alpha1, z=est, xlab=expression(alpha[0]),
ylab=expression(alpha[1]), labcex=0.9)
polygon(x=c(alpha0[1], 0, alpha0[length(alpha0)], alpha0[1]),
y=c(alpha1[length(alpha1)], 0, alpha1[length(alpha1)],
alpha1[length(alpha1)]),col="gray94")
polygon(x=c(alpha0[1], 0, alpha0[length(alpha0)], alpha0[1]),
y=c(alpha1[1], 0, alpha1[1], alpha1[1]), col="gray94")
contour(x=alpha0, y=alpha1, z=lower, xlab=expression(alpha[0]),
ylab=expression(alpha[1]), labcex=0.9)
polygon(x=c(alpha0[1], 0, alpha0[length(alpha0)], alpha0[1]),
y=c(alpha1[length(alpha1)], 0, alpha1[length(alpha1)],
alpha1[length(alpha1)]), col="gray94")
polygon(x=c(alpha0[1], 0, alpha0[length(alpha0)], alpha0[1]),
y=c(alpha1[1], 0, alpha1[1], alpha1[1]), col="gray94")
par(op)
print("—Analysis with X=sdp—")
cfun <- function(x, xprim, alpha) (x-xprim)*alpha
fitY.X <- glm(formula=chd sdp, family="quasibinomial", data=wcgs)
fitY.XZ <- glm(formula=chd sdp*smoke, family="quasibinomial", data=wcgs)
alpha <- cfit(cfun=cfun, nalpha=1, fitY.X=fitY.X, fitY.XZ=fitY.XZ, X="sdp")
print("alpha* with Z*=smoke")
print(alpha)
alpha <- cfit(cfun=cfun, nalpha=1, fitY.X=fitY.X, fitY.XZ=fitY, X="sdp")
print("alpha* with Z*=Z")
print(alpha)
alpha <- c(seq(-0.5, -0.05, 0.05), seq(-0.006, 0.006, 0.003),
seq(0.05, 1, 0.05))
est <- vector(length=length(alpha))
se <- vector(length=length(alpha))
for(i in 1:length(alpha)) {
cfun <- function(data) alpha[i]*(data$x-data$xprim)
out <- glm.sens(formula=chd sdp, fit=fitY, X="sdp", cfun=cfun)
est[i] <- out$coef["sdp"]
se[i] <- sqrt(out$vcov["sdp", "sdp"])
}
ci <- cbind(est-1.96*se, est+1.96*se)
windows(width=14, height=7)
op <- par(mfrow=c(1,2), mar=c(5.1, 4.3, 4.1, 2.1))
incl <- which(alpha>=-0.0061 & alpha<0.0061)
matplot(alpha[incl], cbind(est, ci)[incl, ], type="l", lty=c(1,2,2), col=1,
xlab=expression(alpha), ylab=expression(hat(psi)[1]))
matplot(alpha, cbind(est, ci), type="l", lty=c(1,2,2), col=1,
xlab=expression(alpha), ylab=expression(hat(psi)[1]))
par(op)
print("Estimate and 95
print(est[alpha==0])
print(ci[alpha==0, ])
E <- Eval(formula=chd sdp, fit=fitY, X="sdp")
print("E-value")
print(E)
q <- quantile(wcgs$sdp, probs=c(0.025, 0.975))
sdpmin <- q[1]
sdpmax <- q[2]
fitY<- glm(formula=chd sdp+I(sdp^2)+age+dbp+bmi+smoke, data=wcgs,
family="quasibinomial")
alpha <- seq(-0.006, 0.006, 0.003)
sdp <- sdpmin:sdpmax
est <- matrix(nrow=length(sdp), ncol=length(alpha))
se <- matrix(nrow=length(sdp), ncol=length(alpha))
z1 <- sdp-sdpmin
z2 <- sdp^2-sdpmin^2
z <- cbind(z1, z2)
for(i in 1:length(alpha)) {
cfun <- function(data) alpha[i]*(data$x-data$xprim)
out <- glm.sens(formula=chd sdp+I(sdp^2), fit=fitY, X="sdp", cfun=cfun)
coef <- out$coef[c("sdp", "I(sdp^2)")]
vcov <- out$vcov[c("sdp", "I(sdp^2)"), c("sdp", "I(sdp^2)")]
v1 <- vcov[1, 1]
v2 <- vcov[2, 2]
v12 <- vcov[1, 2]
est[, i] <- z
se[, i] <- sqrt(z1^2*v1+z2^2*v2+2*z1*z2*v12)
}
lower <- est-1.96*se
windows(width=14, height=7)
op <- par(mfrow=c(1,2), mar=c(5.1, 4.3, 4.1, 2.1))
matplot(sdp, est, type="l", lty=1:length(alpha), col=1,
ylab=expression(paste("logit[pY(x)=1; ",hat(psi),"]-logit[pY(106)=1; ",
hat(psi),"]")), xlab="x (systolic blood pressure; mmHg)")
legend(x="topleft", lty=1:length(alpha),
legend=parse(text=paste0(’alpha==’ , alpha)))
matplot(sdp, lower, type="l", lty=1:length(alpha), col=1,
ylab="95
xlab="x (systolic blood pressure; mmHg)")
legend(x="topleft", lty=1:length(alpha),
legend=parse(text=paste0(’alpha==’ , alpha)))
par(op)
#—ANALYSIS FRANKS ET AL (2019)—
library(faraway)
data(wcgs)
wcgs$behave <- as.numeric(wcgs$behave=="A1" ∣ wcgs$behave=="A2")
wcgs$height <- 0.0254*wcgs$height
wcgs$weight <- 0.45359237*wcgs$weight
wcgs$bmi <- wcgs$weight/wcgs$height^2
wcgs$smoke <- as.numeric(wcgs$cigs>0)
wcgs$chd <- as.numeric(wcgs$chd=="yes")
franksfun <- function(X, data, gamma0, gamma1) {
fitY.XZ <- glm(formula=chd (behave+age+sdp+dbp+bmi+smoke)^2,
family="quasibinomial", data=wcgs)
n <- nrow(data)
nX1 <- sum(data[, X])
nX0 <- n-nX1
pX1 <- nX1/n
pX0 <- 1-pX1
lp <- predict(object=fitY.XZ, type="link")
pY10.X0 <- mean(plogis(lp)[data[, X]==0])
pY11.X1 <- mean(plogis(lp)[data[, X]==1])
data0 <- data
data0[, X] <- 0
lp0 <- predict(object=fitY.XZ, newdata=data0, type="link")
pY10.X1 <- mean(plogis(lp0+gamma0)[data[, X]==1])
data1 <- data
data1[, X] <- 1
lp1 <- predict(object=fitY.XZ, newdata=data1, type="link")
pY11.X0 <- mean(plogis(lp1+gamma1)[data[, X]==0])
pY11 <- pY11.X0*pX0+pY11.X1*pX1
pY10 <- pY10.X0*pX0+pY10.X1*pX1
logit(pY11)-logit(pY10)
}
set.seed(1)
gamma <- c(seq(-8, -1, 1), seq(-0.2, 0.2, 0.1), seq(1, 8, 1))
est <- vector(length=length(gamma))
se <- vector(length=length(gamma))
B <- 100
n <- nrow(wcgs)
for(i in 1:length(gamma)) {
gamma1 <- -gamma[i]
gamma0 <- gamma[i]
est[i] <- franksfun(X="behave", data=wcgs, gamma0=gamma0, gamma1=gamma1)
estboot <- vector(length=B)
for(j in 1:B) {
wcgsB <- wcgs[sample(1:n, n, replace=TRUE), ]
estboot[j] <- franksfun(X="behave", data=wcgsB, gamma0=gamma0,
gamma1=gamma1)
}
se[i] <- sd(estboot)
} ci <- cbind(est-1.96*se, est+1.96*se)
windows(width=14, height=7)
op <- par(mfrow=c(1,2), mar=c(5.1, 4.3, 4.1, 2.1))
incl <- which(gamma>=-0.201 & gamma<=0.201)
matplot(gamma[incl], cbind(est, ci)[incl, ], type="l", col=1,
lty=c(1, 2, 2), xlab=expression(-gamma), ylab=expression(hat(psi)[1]))
matplot(gamma, cbind(est, ci), type="l", col=1, lty=c(1, 2, 2),
xlab=expression(-gamma), ylab=expression(hat(psi)[1]))
par(op)

References

[1] Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688–701. 10.1037/h0037350Search in Google Scholar

[2] Hernán MA, Robins JM. Causal inference. What if. Boca Raton: Chapman & Hall/CRC, 2020, forthcoming. Search in Google Scholar

[3] Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder EL. Smoking and lung cancer: recent evidence and a discussion of some questions. J National Cancer Inst. 1959;22(1):173–203. 10.1093/ije/dyp289Search in Google Scholar PubMed

[4] Rosenbaum PR. Observational studies. 2nd ed. New York: Springer-Verlag; 2002. 10.1007/978-1-4757-3692-2Search in Google Scholar

[5] Yadlowsky S, Namkoong H, Basu S, Duchi J, Tian L. Bounds on the conditional and average treatment effect with unobserved confounding factors. 2018. Available from: https://arxiv.org/abs/1808.09521. Search in Google Scholar

[6] Ding P, Vander Weele TJ. Sensitivity analysis without assumptions. Epidemiology. 2016;27(3):368–77. 10.1097/EDE.0000000000000457Search in Google Scholar PubMed PubMed Central

[7] Vander Weele TJ, Ding P. Sensitivity analysis in observational research: introducing the E-value. Annal Internal Med. 2017;167(4):268–74. 10.7326/M16-2607Search in Google Scholar PubMed

[8] Cinelli C, Hazlett C. Making sense of sensitivity: extending omitted variable bias. J R Stat Soc Ser B (Stat Methodol). 2020;82(1):39–67. 10.1111/rssb.12348Search in Google Scholar

[9] Chernozhukov V, Cinelli C, Newey W, Sharma A, Syrgkanis V. Long story short: omitted variable bias in causal machine learning. 2021. Available from: https://arxiv.org/abs/2112.13398. 10.3386/w30302Search in Google Scholar

[10] Brumback BA, Hernán MA, Haneuse SJPA, Robins JM. Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures. Stat Med. 2004;23(5):749–67. 10.1002/sim.1657Search in Google Scholar PubMed

[11] Chiba Y. Sensitivity analysis of unmeasured confounding for the causal risk ratio by applying marginal structural models. Commun Stat-Theory Meth. 2009;39(1):65–76. 10.1080/03610920802677224Search in Google Scholar

[12] Blackwell M. A selection bias approach to sensitivity analysis for causal effects. Political Analysis. 2014;22(2):169–82. 10.1093/pan/mpt006Search in Google Scholar

[13] Ciocănea-Teodorescu I, Gabriel EE, Sjölander A. Sensitivity analysis for unmeasured confounding in the estimation of marginal causal effects. Biometrika. 2022;109(4):1101–16.10.1093/biomet/asac018Search in Google Scholar

[14] Franks A, D’Amour A, Feller A. Flexible sensitivity analysis for observational studies without observable implications. J Am Stat Assoc. 2019;115(532):1730–46. 10.1080/01621459.2019.1604369Search in Google Scholar

[15] Scharfstein DO, Nabi R, Kennedy EH, Huang MY, Bonvini M, Smid M. Semiparametric sensitivity analysis: unmeasured confounding in observational studies. 2021. Available from: https://arxiv.org/abs/2104.08300. Search in Google Scholar

[16] Schlesselman JJ. Assessing effects of confounding variables. Am J Epidemiol. 1978;108(1):3–8. Search in Google Scholar

[17] Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc Ser B (Methodological). 1983;45(2):212–8. 10.1017/CBO9780511810725.017Search in Google Scholar

[18] Lin DY, Psaty BM, Kronmal RA. Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics. 1998;54(3):948–63. 10.2307/2533848Search in Google Scholar

[19] Imbens GW. Sensitivity to exogeneity assumptions in program evaluation. Am Econ Rev. 2003;93(2):126–32. 10.1257/000282803321946921Search in Google Scholar

[20] Vander Weele TJ, Arah OA. Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology. 2011;22(1):42–52. 10.1097/EDE.0b013e3181f74493Search in Google Scholar PubMed PubMed Central

[21] Tan Z. A distributional approach for causal inference using propensity scores. J Am Stat Assoc. 2006;101(476):1619–37. 10.1198/016214506000000023Search in Google Scholar

[22] Kallus N, Zhou A. Confounding-Robust policy improvement. 2018. Available from: https://arxiv.org/abs/1805.08593. Search in Google Scholar

[23] Kallus N, Mao X, Zhou A. Interval estimation of individual-level causal effects under unobserved confounding. In: The 22nd International Conference on Artificial Intelligence and Statistics. PMLR; 2019. p. 2281–90. Search in Google Scholar

[24] Zhao Q, Small DS, Bhattacharya BB. Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. J R Stat Soc Series B (Stat Methodol). 2019;81(4):735–61. 10.1111/rssb.12327Search in Google Scholar

[25] Jesson A, Mindermann S, Gal Y, Shalit U. Quantifying ignorance in individual-level causal-effect estimates under hidden confounding. 2021. Available from: https://arxiv.org/abs/2103.04850. Search in Google Scholar

[26] Pearl J. Causality: models, reasoning, and inference. 2nd ed. New York: Cambridge University Press; 2009. 10.1017/CBO9780511803161Search in Google Scholar

[27] Gustafson P, McCandless LC. When is a sensitivity parameter exactly that? Stat Sci. 2018;33(1):86–95. 10.1214/17-STS632Search in Google Scholar

[28] Steen J, Loeys T, Moerkerke B, Vansteelandt S. Medflex: an R package for flexible mediation analysis using natural effect models. J Stat Software. 2017;76(11):1–46. 10.18637/jss.v076.i11Search in Google Scholar

[29] Stefanski LA, Boos DD. The calculus of M-estimation. Am Stat. 2002;56(1):29–38. 10.1198/000313002753631330Search in Google Scholar

[30] Sjölander A. Regression standardization with the R package stdReg. European J Epidemiol. 2016;31(6):563–74. 10.1007/s10654-016-0157-3Search in Google Scholar PubMed

[31] Robins JM. The analysis of randomized and non-randomized AIDS treatment trials using a new approach to causal inference in longitudinal studies. In: Sechrest L, Freeman H, Mulley A, editors. Health service research methodology: a focus on AIDS. US Public Health Service, National Center for Health Services Research; 1989. p. 113–59. Search in Google Scholar

[32] Greenland S, Pearl J, Robins JM. Confounding and collapsibility in causal inference. Statist Sci 1999;14(1):29–46. 10.1214/ss/1009211805Search in Google Scholar

Received: 2022-06-12
Revised: 2022-10-19
Accepted: 2022-10-24
Published Online: 2022-12-13

© 2022 the author(s), published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Editorial
  2. Causation and decision: On Dawid’s “Decision theoretic foundation of statistical causality”
  3. Research Articles
  4. Simple yet sharp sensitivity analysis for unmeasured confounding
  5. Decomposition of the total effect for two mediators: A natural mediated interaction effect framework
  6. Causal inference with imperfect instrumental variables
  7. A unifying causal framework for analyzing dataset shift-stable learning algorithms
  8. The variance of causal effect estimators for binary v-structures
  9. Treatment effect optimisation in dynamic environments
  10. Optimal weighting for estimating generalized average treatment effects
  11. A note on efficient minimum cost adjustment sets in causal graphical models
  12. Estimating marginal treatment effects under unobserved group heterogeneity
  13. Properties of restricted randomization with implications for experimental design
  14. Clarifying causal mediation analysis: Effect identification via three assumptions and five potential outcomes
  15. A generalized double robust Bayesian model averaging approach to causal effect estimation with application to the study of osteoporotic fractures
  16. Sensitivity analysis for causal effects with generalized linear models
  17. Individualized treatment rules under stochastic treatment cost constraints
  18. A Lasso approach to covariate selection and average treatment effect estimation for clustered RCTs using design-based methods
  19. Bias attenuation results for dichotomization of a continuous confounder
  20. Review Article
  21. Causal inference in AI education: A primer
  22. Commentary
  23. Comment on: “Decision-theoretic foundations for statistical causality”
  24. Decision-theoretic foundations for statistical causality: Response to Shpitser
  25. Decision-theoretic foundations for statistical causality: Response to Pearl
  26. Special Issue on Integration of observational studies with randomized trials
  27. Identifying HIV sequences that escape antibody neutralization using random forests and collaborative targeted learning
  28. Estimating complier average causal effects for clustered RCTs when the treatment affects the service population
  29. Causal effect on a target population: A sensitivity analysis to handle missing covariates
  30. Doubly robust estimators for generalizing treatment effects on survival outcomes from randomized controlled trials to a target population
Downloaded on 15.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jci-2022-0040/html
Scroll to top button