Bounding the probabilities of benefit and harm through sensitivity parameters and proxies

Jose M. Peña

doi:10.1515/jci-2023-0012

Artikel Open Access

Bounding the probabilities of benefit and harm through sensitivity parameters and proxies

Jose M. Peña

Veröffentlicht/Copyright: 23. August 2023

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Journal of Causal Inference Band 11 Heft 1

Abstract

We present two methods for bounding the probabilities of benefit (a.k.a. the probability of necessity and sufficiency, i.e., the desired effect occurs if and only if exposed) and harm (i.e., the undesired effect occurs if and only if exposed) under unmeasured confounding. The first method computes the upper or lower bound of either probability as a function of the observed data distribution and two intuitive sensitivity parameters, which can then be presented to the analyst as a 2-D plot to assist in decision-making. The second method assumes the existence of a measured nondifferential proxy for the unmeasured confounder. Using this proxy, tighter bounds than the existing ones can be derived from just the observed data distribution.

Keywords: sensitivity analysis; probability of necessity and sufficiency; unmeasured confounding; proxies

MSC 2010: 62D20

1 Introduction

Consider the causal graph in Figure 1, where X denotes the exposure, Y denotes the outcome, and U denotes the unmeasured confounders. Let X and Y be binary random variables taking values in { x , x ′ } and { y , y ′ } , respectively. Let Y x and Y x ′ denote the counterfactual outcome when the exposure is set to level X = x and X = x ′ , respectively. Let y x denote the event Y x = y , y x ′ denote the event Y x = y ′ , y x ′ denote the event Y x ′ = y , and y x ′ ′ denote the event Y x ′ = y ′ . For instance, let X represent whether a patient gets treated or not for a deadly disease, and Y represent whether the patient survives it or not. Individual patients can be classified into immune (they survive whether they are treated or not), causal (they survive if and only if treated), preventive (they die if and only if treated), and doomed (they die whether they are treated or not). In this article, we are interested in the probability of a patient being of causal type (or, equivalently, the proportion of causal type in the population), because it represents the actual benefit of the treatment. Likewise, we are also interested in the probability of a patient being of preventive type, since it indicates how harmful the treatment is. These quantities are not measured by other popular measures such as the average treatment effect (ATE), which this article considers on a difference scale, and thus, it corresponds to the difference in survival of a patient when treated ( X = x ) and not treated ( X = x ′ ) averaged over the entire population:

ATE = E [ Y x − Y x ′ ] = p ( y x ) − p ( y x ′ ) .

Note that the first term comprises both causal and immune types, while the second term comprises both preventive and immune types.^[1]

Figure 1

Causal graph where U is unmeasured.

Formally, the probability of benefit [1] (a.k.a. the probability of necessity and sufficiency [2,3]) is the probability of survival if treated and death otherwise:

p ( benefit ) = p ( y x , y x ′ ′ ) .

The probability of harm [1] is the probability of death if treated and survival otherwise:

p ( harm ) = p ( y x ′ , y x ′ ) .

In general, neither the ATE nor p ( benefit ) nor p ( harm ) are identifiable from the observed data distribution, due to the unobserved confounder U and the lack of knowledge of the functional forms that connect causes and effects. However, p ( benefit ) can be bounded in terms of the observed data distribution [3]:

(1) 0 ≤ p ( benefit ) ≤ p ( x , y ) + p ( x ′ , y ′ ) .

Likewise, p ( harm ) can be bounded by simply swapping x and x ′ . The bounds are sharp, i.e., logically possible. Tighter bounds exist but they include counterfactual probabilities that, in general, are not identifiable from the observed data distribution due to the unobserved confounder U [3]:

(2) max 0 , p ( y x ) − p ( y x ′ ) , p ( y ) − p ( y x ′ ) , p ( y x ) − p ( y ) ≤ p ( benefit ) ≤ min p ( y x ) , p ( y x ′ ′ ) , p ( x , y ) + p ( x ′ , y ′ ) , p ( y x ) − p ( y x ′ ) + p ( x , y ′ ) + p ( x ′ , y ) .

Likewise, p ( harm ) can be bounded by simply swapping x and x ′ . The bounds are sharp. Although these bounds are not identifiable from the observed data distribution, the counterfactual probabilities in them can be bounded themselves in terms of the observed data distribution and some sensitivity parameters. This results in a method for sensitivity analysis of p ( benefit ) and p ( harm ) . Alternatively, the counterfactual probabilities can be bounded in terms of just the observed data distribution whenever a proxy of the unmeasured confounder U is measured. This results in tighter bounds than the ones in equation (1).

The rest of the article is organized as follows. Section 2 describes our sensitivity analysis method, and illustrates it with an example. Section 3 presents our tighter bounds, illustrates it an example, and reports simulations showing that our bounds are useful in many cases. We close the article with Section 4, where we discuss our results and related works. The main difference between ours and the existing works is that we just make use of the observed data distribution to bound the quantities of interest, i.e., no counterfactual probability or experimental data is involved.

2 Sensitivity analysis of p ( benefit ) and p ( harm )

For simplicity, we assume that the unmeasured confounders U in Figure 1 are categorical, but our results also hold for ordinal and continuous confounders.^[2] For simplicity, we treat U as a categorical random variable whose levels are the Cartesian product of the levels of the elements in the original U .

Note that

(3) p ( y x ) = p ( y x ∣ x ) p ( x ) + p ( y x ∣ x ′ ) p ( x ′ ) = p ( y ∣ x ) p ( x ) + p ( y x ∣ x ′ ) p ( x ′ )

where the second equality follows from counterfactual consistency, i.e., X = x ⇒ Y x = Y . Moreover,

(4) p ( y x ∣ x ′ ) = ∑ u p ( y x ∣ x ′ , u ) p ( u ∣ x ′ ) = ∑ u p ( y ∣ x , u ) p ( u ∣ x ′ ) ≤ max x , u p ( y ∣ x , u )

where the second equality follows from Y x ⊥ X ∣ U for all x , and counterfactual consistency. Likewise,

(5) p ( y x ∣ x ′ ) ≥ min x , u p ( y ∣ x , u ) .

Now, let us define

M x = max u p ( y ∣ x , u )

and

m x = min u p ( y ∣ x , u ) .

Then,

(6) p ( x , y ) + p ( x ′ ) m x ≤ p ( y x ) ≤ p ( x , y ) + p ( x ′ ) M x

and, likewise,

(7) p ( x ′ , y ) + p ( x ) m x ′ ≤ p ( y x ′ ) ≤ p ( x ′ , y ) + p ( x ) M x ′ .

Therefore,

(8) max 0 , p ( x , y ) + p ( x ′ ) m x − p ( x ′ , y ) − p ( x ) M x ′ , p ( x , y ) − p ( x ) M x ′ , p ( x ′ ) m x − p ( x ′ , y ) ≤ p ( benefit )

and

(9) p ( benefit ) ≤ min p ( x , y ) + p ( x ′ ) M x , 1 − p ( x ′ , y ) − p ( x ) m x ′ , p ( x , y ) + p ( x ′ , y ′ ) , p ( x ) + p ( x ′ ) M x − p ( x ) m x ′

where m x , M x , m x ′ , and M x ′ are sensitivity parameters. See Appendix 5 for the derivations of the bounds above. The fact that each bound only involves two sensitivity parameters makes the sensitivity analysis easy to visualize in tables or 2-D plots. The possible regions for m x and M x are

0 ≤ m x ≤ p ( y ∣ x ) ≤ M x ≤ 1

and likewise for m x ′ and M x ′ .

Our lower bound in equation (8) is informative if and only if^[3]

0 < p ( x , y ) − p ( x ) M x ′

0 < p ( x ′ ) m x − p ( x ′ , y ) .

Then, the informative regions for m x and M x ′ are

p ( y ∣ x ′ ) < m x ≤ p ( y ∣ x )

and

p ( y ∣ x ′ ) ≤ M x ′ < p ( y ∣ x ) .

On the other hand, our upper bound in equation (9) is more informative than the upper bound in equation (1) if and only if^[4]

p ( x , y ) + p ( x ′ ) M x < p ( x , y ) + p ( x ′ , y ′ )

1 − p ( x ′ , y ) − p ( x ) m x ′ < p ( x , y ) + p ( x ′ , y ′ ) ,

which occurs if and only if p ( y ∣ x ) < p ( y ′ ∣ x ′ ) or p ( y ∣ x ) > p ( y ′ ∣ x ′ ) . Therefore, our upper bound is always more informative than that in equation (1). Then, the informative regions for m x ′ and M x coincide with their possible regions. The reasoning above can be repeated for p ( harm ) by simply swapping x and x ′ .

2.1 Sensitivity analysis of the average treatment effect

The average treatment effect is the difference in survival of a patient when treated and not treated averaged over the entire population:

ATE = E [ Y x − Y x ′ ] = p ( y x ) − p ( y x ′ ) .

Like p ( benefit ) and p ( harm ) , the ATE is not identifiable from the observed data distribution in general, due to the unobserved confounder U (recall equation (11)). However, it can be bounded by equations (6) and (7):

p ( x , y ) + p ( x ′ ) m x − p ( x ′ , y ) − p ( x ) M x ′ ≤ ATE ≤ p ( x , y ) + p ( x ′ ) M x − p ( x ′ , y ) − p ( x ) m x ′ .

This results in a method for sensitivity analysis of the ATE, where, as before, m x , M x , m x ′ , and M x ′ are the sensitivity parameters.

The sensitivity analysis of the ATE can supplement the sensitivity analysis of p ( benefit ) and p ( harm ) with additional information, as the three quantities are related [1] as follows:

(10) ATE = [ p ( y x , y x ′ ) + p ( y x , y x ′ ′ ) ] − [ p ( y x ′ , y x ) + p ( y x ′ , y x ′ ) ] = p ( benefit ) − p ( harm ) .

We illustrate this in the next section.

2.2 Example

We illustrate our method for sensitivity analysis of p ( benefit ) and p ( harm ) with the following fictitious epidemiological example.^[5] Consider a population consisting of a majority and a minority group. Let the binary random variable U represent the group an individual belongs to. Let the binary random variable X represent whether the individual gets treated or not for a certain disease. Let the binary random variable Y represent whether the individual survives the disease. Assume that the scientific community agrees that U is a confounder for X and Y . Assume also that it is illegal to store the values of U , to avoid discrimination complaints. In other words, the identity of the confounder is known, but its values are not. More specifically, consider the following data generation model:

p ( u ) = 0.9 p ( x ∣ u ) = 0.2 p ( y ∣ x , u ) = 0.4 p ( y ∣ x , u ′ ) = 0.6 p ( x ∣ u ′ ) = 0.6 p ( y ∣ x ′ , u ) = 0.1 p ( y ∣ x ′ , u ′ ) = 0.3 .

Since this model does not specify the functional forms of the causal mechanisms, we cannot compute the true p ( benefit ) and p ( harm ) . See the study by Tian and Pearl [3] for more information on this. However, we can use equation (2) to bound them. Specifically, since there is no confounding besides U , we have that Y x ⊥ X ∣ U for all x , and thus, we can write

(11) p ( y x ) = ∑ u p ( y ∣ x , u ) p ( u )

using first the law of total probability, then Y x ⊥ X ∣ U , and, finally, the law of counterfactual consistency, i.e., X = x ⇒ Y x = Y . Therefore, p ( benefit ) ∈ [ 0.3 , 0.42 ] and p ( harm ) ∈ [ 0 , 0.12 ] .

Figure 2 (top) shows our lower bound of p ( benefit ) as a function of the sensitivity parameters m x and M x ′ . The axes span the possible regions of the parameters. The dashed lines indicate the informative regions of the parameters. Specifically, the bottom right quadrant corresponds to the non-informative region, i.e., the region where our lower bound is zero. In the data generation model considered, m x = 0.4 and M x ′ = 0.3 . These values are unknown to the epidemiologist, because U is unobserved. However, the figure reveals that the epidemiologist only needs to have some rough idea of these values to confidently conclude that p ( benefit ) is lower bounded by 0.15. Figure 2 (bottom) shows our upper bound of p ( benefit ) as a function of the sensitivity parameters m x ′ and M x . Likewise, having some rough idea of the unknown values m x ′ = 0.1 and M x = 0.6 enables the epidemiologist to confidently conclude that the p ( benefit ) is upper bounded by 0.65. equation (1) produces much looser bounds, namely 0 and 0.79. Recall that p ( benefit ) ∈ [ 0.3 , 0.42 ] in truth.

$Figure 2 Lower and upper bounds of p ( benefit ) p\left(\hspace{0.1em}\text{benefit}\hspace{0.1em}) in the example in Section 2.2 as functions of the sensitivity parameters m x {m}_{x} , M x {M}_{x} , m x ′ {m}_{x^{\prime} } , and M x ′ {M}_{x^{\prime} } .$

Figure 2

Lower and upper bounds of p ( benefit ) in the example in Section 2.2 as functions of the sensitivity parameters m x , M x , m x ′ , and M x ′ .

A similar reasoning leads the epidemiologist to conclude from Figure 3 that p ( harm ) ∈ [ 0 , 0.18 ] . equation (1) produces a slightly wider interval, namely [ 0 , 0.22 ] . Recall that p ( harm ) ∈ [ 0 , 0.12 ] in truth.

$Figure 3 Lower and upper bounds of p ( harm ) p\left(\hspace{0.1em}\text{harm}\hspace{0.1em}) in the example in Section 2.2 as functions of the sensitivity parameters m x {m}_{x} , M x {M}_{x} , m x ′ {m}_{x^{\prime} } , and M x ′ {M}_{x^{\prime} } .$

Figure 3

Lower and upper bounds of p ( harm ) in the example in Section 2.2 as functions of the sensitivity parameters m x , M x , m x ′ , and M x ′ .

Finally, the epidemiologist can combine p ( benefit ) and p ( harm ) into a measure of social good of the treatment. Say that the social benefit of somebody who survives the diseases if and only if treated is 1 unit, while the social harm of somebody who dies if and only if treated is 1.5 units (one unit for the death, and half a unit for the missed opportunity to cure somebody else). Then, our bounds above imply that the social good of the treatment lies in the interval [ − 0.12 , 0.65 ] , i.e., 0.15 ∗ 1 − 0.18 ∗ 1.5 = − 0.12 and 0.65 ∗ 1 − 0 ∗ 1.5 = 0.65 . The social good is more uncertain when using the bounds in equation (1), since they result in the wider interval [ − 0.33 , 0.78 ] . The true social good of the treatment lies in the interval [ 0.12 , 0.42 ] .

We now illustrate how the sensitivity analysis of the ATE described in Section 2.1 can supplement the previous sensitivity analysis of p ( benefit ) and p ( harm ) with additional information. Specifically, Figure 4 shows our lower and upper bounds of the ATE as functions of the sensitivity parameters m x , M x , m x ′ , and M x ′ . Recall that m x = 0.4 , m x ′ = 0.1 , M x = 0.6 , and M x ′ = 0.3 in this example. These values are unknown to the epidemiologist. However, she only needs to have some rough idea of these values to confidently conclude that ATE ∈ [ 0.15 , 0.55 ] . Note that ATE = 0.3 in truth by equation (11).

$Figure 4 Lower and upper bounds of the ATE in the example in Section 2.2 as functions of the sensitivity parameters m x {m}_{x} , M x {M}_{x} , m x ′ {m}_{x^{\prime} } , and M x ′ {M}_{x^{\prime} } .$

Figure 4

Lower and upper bounds of the ATE in the example in Section 2.2 as functions of the sensitivity parameters m x , M x , m x ′ , and M x ′ .

Recall that the epidemiologist previously concluded that p ( benefit ) ∈ [ 0.15 , 0.65 ] and p ( harm ) ∈ [ 0 , 0.18 ] . However, p ( benefit ) and p ( harm ) must now also comply with the result of the sensitivity analysis of the ATE, i.e., ATE = p ( benefit ) − p ( harm ) ∈ [ 0.15 , 0.55 ] . Specifically, only the values between the two lines in Figure 5 comply with the sensitivity analysis of the ATE, p ( benefit ) and p ( harm ) .

Figure 5

Only the values between the two lines comply with the ATE interval in the example in Section 2.2.

Recall that the epidemiologist previously concluded that the social good of the treatment lies in the interval [ − 0.12 , 0.65 ] . The lower end of the interval was obtained by combining the upper bound of p ( harm ) (i.e., 0.18) and the lower bound of p ( benefit ) (i.e., 0.15). These bounds were obtained by sensitivity analysis of p ( benefit ) and p ( harm ) , but they do not comply with the sensitivity analysis of the ATE, i.e., they are not between the two lines in Figure 5. Instead, the figure indicates that the lower end of the social good interval should correspond to p ( harm ) = 0.18 and p ( benefit ) = 0.33 , whereas the upper end should correspond to p ( harm ) = 0 and p ( benefit ) = 0.55 . Thus, the social good of the treatment lies in the interval [ − 0.06 , 0.55 ] . This interval is more informative than the previous one, since it is narrower. Moreover, it mostly contains positive values, which indicates that the treatment is most likely beneficial to society. Recall that the true social good of the treatment lies in the interval [ 0.12 , 0.42 ] .

3 Tighter bounds of p ( benefit ) and p ( harm ) via proxies

Consider the causal graph in Figure 6, where X denotes the exposure, Y denotes the outcome, and U denotes the unmeasured confounders. Like before, let X = x , x ′ , and Y = y , y ′ be binary random variables. Unlike before, let U = u , u ′ be a binary random variable too. Finally, let V = v , v ′ denote a measured binary proxy of U . Note that V is a nondifferential proxy, i.e., V is conditionally independent of X and Y given U . Hereinafter, we just consider p ( benefit ) . Our results apply to p ( harm ) by simply swapping x and x ′ .

Figure 6

Causal graph where U is unmeasured.

From equation (11), we have that

ATE = E [ Y x − Y x ′ ] = p ( y x ) − p ( y x ′ ) = ∑ u p ( y ∣ x , u ) p ( u ) − ∑ u p ( y ∣ x ′ , u ) p ( u ) .

Since U is unmeasured, the ATE cannot be computed. However, it can be approximated by the crude or unadjusted average treatment effect,

ATE crude = E [ Y ∣ x ] − E [ Y ∣ x ′ ] = p ( y ∣ x ) − p ( y ∣ x ′ ) ,

and by the observed or partially adjusted average treatment effect,

ATE obs = ∑ v p ( y ∣ x , v ) p ( v ) − ∑ v p ( y ∣ x ′ , v ) p ( v ) .

Ogburn and VanderWeele [4] prove that the ATE obs lies between the ATE crude and the ATE if E [ Y ∣ X , U ] is monotone in U , i.e., E [ Y ∣ X , U ] is nondecreasing or nonincreasing in U , i.e.,

E [ Y ∣ x , u ] ≥ E [ Y ∣ x , u ′ ] and E [ Y ∣ x ′ , u ] ≥ E [ Y ∣ x ′ , u ′ ]

E [ Y ∣ x , u ] ≤ E [ Y ∣ x , u ′ ] and E [ Y ∣ x ′ , u ] ≤ E [ Y ∣ x ′ , u ′ ] .

In words, E [ Y ∣ X , U ] is monotone in U if the average causal effect of U on Y is in the same direction among the treated ( X = x ) and the untreated ( X = x ′ ). Ogburn and VanderWeele [4] argue that this condition is likely to hold in most applications in epidemiology. Unfortunately, the condition is untestable from the observed data distribution, because U is unmeasured. Fortunately, E [ Y ∣ X , U ] is monotone in U if and only if E [ Y ∣ X , V ] is monotone in V [5], which is testable.

Provided that E [ Y ∣ X , V ] is monotone in V , the results above lead to tighter bounds than those in equation (1) from just the observed data distribution. Specifically, if E [ Y ∣ X , V ] is monotone in V and ATE crude ≤ ATE obs , then ATE crude ≤ ATE obs ≤ ATE , and thus

(12) max 0 , ATE obs ≤ p ( benefit ) ≤ p ( x , y ) + p ( x ′ , y ′ )

by equation (2). On the other hand, if E [ Y ∣ X , V ] is monotone in V and ATE obs ≤ ATE crude , then ATE ≤ ATE obs ≤ ATE crude , and thus

(13) 0 ≤ p ( benefit ) ≤ min p ( x , y ) + p ( x ′ , y ′ ) , ATE obs + p ( x , y ′ ) + p ( x ′ , y )

by equation (2). Note that the conditions under which the new bounds hold (i.e., E [ Y ∣ X , V ] is monotone in V and ATE crude ≤ ATE obs or ATE obs ≤ ATE crude ) are testable from the observed data distribution.

3.1 Bounds under nonincreasing and nondecreasing conditions

Let S x = ∑ v p ( y ∣ x , v ) p ( v ) , and note that ATE obs = S x − S x ′ . If E [ Y ∣ X , U ] and E [ X ∣ U ] are one nonincreasing and the other nondecreasing in U , then S x ≤ p ( y x ) and p ( y x ′ ) ≤ S x ′ , and thus ATE obs ≤ ATE [4]. On the other hand, if E [ Y ∣ X , U ] and E [ X ∣ U ] are both nonincreasing or both nondecreasing in U , then p ( y x ) ≤ S x and S x ′ ≤ p ( y x ′ ) , and thus ATE ≤ ATE obs [4]. Unfortunately, the antecedents of these rules are untestable from the observed data distribution, because U is unmeasured. Fortunately, they can be replaced by testable antecedents as follows: E [ Y ∣ X , U ] and E [ X ∣ U ] are one nonincreasing and the other nondecreasing in U if and only E [ Y ∣ X , V ] and E [ X ∣ V ] are one nonincreasing and the other nondecreasing in V , and E [ Y ∣ X , U ] and E [ X ∣ U ] are both nonincreasing or both nondecreasing in U if and only if E [ Y ∣ X , V ] and E [ X ∣ V ] are both nonincreasing or both nondecreasing in V [5].^[6] Note that E [ X ∣ V ] is always monotone in V .

Provided that E [ Y ∣ X , V ] is monotone in V , the results above lead to tighter bounds than those in equations (1), (12), and (13) from just the observed data distribution. Specifically, if E [ Y ∣ X , V ] and E [ X ∣ V ] are one nonincreasing and the other nondecreasing in V , then

(14) max 0 , ATE obs , p ( y ) − S x ′ , S x − p ( y ) ≤ p ( benefit ) ≤ p ( x , y ) + p ( x ′ , y ′ )

by equation (2). On the other hand, if E [ Y ∣ X , V ] and E [ X ∣ V ] are both nonincreasing or both nondecreasing in V , then

(15) 0 ≤ p ( benefit ) ≤ min S x , 1 − S x ′ , p ( x , y ) + p ( x ′ , y ′ ) , ATE obs + p ( x , y ′ ) + p ( x ′ , y )

by equation (2). Note that the conditions under which the bounds above hold are testable from the observed data distribution.

3.2 Condition-free bounds

Peña [5] proved that some of the results in the previous section also hold under weaker conditions.^[7] Specifically, if E [ Y ∣ x , V ] and E [ X ∣ V ] are one nonincreasing and the other nondecreasing in V , then S x ≤ p ( y x ) ; otherwise, p ( y x ) ≤ S x . Likewise, for x ′ , instead of x , replace ≤ with ≥ .

The results above lead to tighter bounds than those in equation (1) from just the observed data distribution. Specifically, if E [ Y ∣ x , V ] and E [ X ∣ V ] are one nonincreasing and the other nondecreasing in V , then

(16) max 0 , S x − p ( y ) ≤ p ( benefit ) ≤ p ( x , y ) + p ( x ′ , y ′ )

by equation (2); otherwise,

(17) 0 ≤ p ( benefit ) ≤ min S x , p ( x , y ) + p ( x ′ , y ′ ) .

On the other hand, if E [ Y ∣ x ′ , V ] and E [ X ∣ V ] are one nonincreasing and the other nondecreasing in V , then

(18) max 0 , p ( y ) − S x ′ ≤ p ( benefit ) ≤ p ( x , y ) + p ( x ′ , y ′ )

by equation (2); otherwise,

(19) 0 ≤ p ( benefit ) ≤ min 1 − S x ′ , p ( x , y ) + p ( x ′ , y ′ ) .

Note that unlike equations (12)–(15) that require E [ Y ∣ X , V ] to be monotone in V , always either equation (16) or (17) applies, and always either equation (18) or 19 applies, because E [ Y ∣ x , V ] , E [ Y ∣ x ′ , V ] and E [ X ∣ V ] are always monotone in V . Note, however, that if E [ Y ∣ X , V ] is monotone in V , then equations (14) and (15) produce tighter bounds than equations (16)–(19): If equation (14) applies then equations (16) and (18) also apply, but the former produces tighter bounds. Likewise, for equations (15), (17), and (19).

3.3 Example

To illustrate our tighter bounds of p ( benefit ) and p ( harm ) , we extend the example from Section 2.2 with a measured binary proxy V of the unmeasured confounder U . Recall that U represents whether an individual belongs to the majority or minority group in the population under study. Let V represent whether an individual has sought help for unrelated diseases in the last year, and let

p ( v ∣ u ) = 0.8 p ( v ∣ u ′ ) = 0.3 .

Recall that p ( benefit ) ∈ [ 0.3 , 0.42 ] and ATE = 0.3 in truth, and also note that E [ Y ∣ X , U ] is monotone (nonincreasing) in U because the probability of survival is smaller for an individual from the majority group than for one from the minority group, regardless of whether they are treated or not.

While the epidemiologist cannot test from the observed data distribution whether E [ Y ∣ X , U ] is monotone in U , they can test whether E [ Y ∣ X , V ] is monotone in V . Specifically, they can compute

p ( y ∣ x , v ) = 0.42 p ( y ∣ x , v ′ ) = 0.51 p ( y ∣ x ′ , v ) = 0.1 p ( y ∣ x ′ , v ′ ) = 0.13

and conclude that E [ Y ∣ X , V ] is monotone (nonincreasing) in V . Therefore, either equation (12) or (13) applies. The epidemiologist can then compute ATE crude = 0.34 and ATE obs = 0.33 from the observed data distribution, and conclude that equation (13) applies. Using the observed data distribution one last time, the epidemiologist then concludes that p ( benefit ) ∈ [ 0 , 0.55 ] . This interval is substantially narrower than the interval [ 0 , 0.79 ] returned by equation (1).

The epidemiologist can also compute

p ( x ∣ v ) = 0.22 p ( x ∣ v ′ ) = 0.31

from the observed data distribution and conclude that E [ X ∣ V ] is also nonincreasing in V . Therefore, equation (15) applies. Using the observed data distribution again, the epidemiologist then concludes that p ( benefit ) ∈ [ 0 , 0.45 ] . This interval is narrower than the interval [ 0 , 0.55 ] returned by equation (13), and much narrower than the interval [ 0 , 0.79 ] returned by equation (1). Recall that p ( benefit ) ∈ [ 0.3 , 0.42 ] in truth.

Finally, we modify the running example so that now p ( y ∣ x ′ , u ) = 0.4 , which implies that the true p ( benefit ) now lies in the interval [ 0.03 , 0.42 ] . The epidemiologist can compute

p ( y ∣ x , v ) = 0.42 p ( y ∣ x , v ′ ) = 0.51 p ( y ∣ x ′ , v ) = 0.4 p ( y ∣ x ′ , v ′ ) = 0.38

from the observed data distribution and conclude that E [ Y ∣ x , V ] and E [ Y ∣ x ′ , V ] are, respectively, nonincreasing and nondecreasing in V . Likewise, they can compute

p ( x ∣ v ) = 0.22 p ( x ∣ v ′ ) = 0.31

and conclude that E [ X ∣ V ] is nonincreasing in V . Therefore, equations (17) and (18) apply (note that equations (12)–(15) do not apply, since E [ Y ∣ X , V ] is not monotone in V ). Using the observed data distribution again, the epidemiologist then concludes that p ( benefit ) ∈ [ 0 , 0.45 ] from equation (17) and p ( benefit ) ∈ [ 0.01 , 0.57 ] from equation (18), and thus p ( benefit ) ∈ [ 0.01 , 0.45 ] . This interval is narrower than the interval [ 0 , 0.57 ] returned by equation (1).

3.4 Simulations

In this section, we show through simulations that our condition-free bounds in equations (16)–(19) are useful in many cases. Specifically, we randomly generate 100,000 probability distributions compatible with the causal graph in Figure 6. For the i th distribution, let [ a i , b i ] denote the interval for p ( benefit ) returned by equation (1), and [ c i , d i ] denote the interval returned by equations (16)–(19). Let the gap decrease due to equations (16)–(19) be defined as b i − a i − ( d i − c i ) . Likewise, let the lower bound increase be defined as c i − a i , and the upper bound decrease be defined as b i − d i . Finally, we say that equations (16)–(19) are useful for the i th distribution if a i < c i or d i < b i .

Table 1 displays the results of our simulations. equations (16)–(19) are useful in 70% of the simulations, which is a substantial percentage. When they are useful, these equations return an interval that is 0.17 units on average narrower than the interval returned by equation (1). More concretely, they increase the lower bound by 0.08 units on average, and decrease the upper bound by 0.09 units on average. In some cases, the improvement exceeds the 0.8 units. The improvement in individual simulations can be better appreciated in Figure 7, which summarizes the first 100 simulations sorted by the upper bound returned by equation (1).

Table 1

Results of the simulations in Section 3.4

Usefulness	70%
Average gap decrease	0.17
Maximum gap decrease	0.88
Average lower bound increase	0.08
Maximum lower bound increase	0.88
Average upper bound decrease	0.09
Maximum upper bound decrease	0.86

Figure 7

Results of the first 100 simulations in Section 3.4 sorted by the upper bound returned by equation (1).

4 Discussion

The contribution of this work is twofold. First, to present a sensitivity analysis method for p ( benefit ) and p ( harm ) under unmeasured confounding, and second, to tighten the existing bounds of p ( benefit ) and p ( harm ) from just the observed data distribution by using a proxy of the unmeasured confounder.

Our sensitivity analysis method has four sensitivity parameters (i.e., m x , M x , m x ′ , and M x ′ ), two per (lower or upper) bound. The purpose of these parameters is to bound the counterfactual probabilities p ( y x ∣ x ) and p ( y x ′ ∣ x ) by equations (4) and (5), which, in their turn, bound the counterfactual probabilities p ( y x ) and p ( y x ′ ) by equation (3), which, in their turn, bound p ( benefit ) and p ( harm ) by equation (2). Therefore, we could have alternatively used p ( y x ∣ x ) and p ( y x ′ ∣ x ) or p ( y x ) and p ( y x ′ ) as sensitivity parameters. We believe that it may be easier for the analyst (e.g., epidemiologist) to reason about our sensitivity parameters than about the alternative ones. Our parameters directly refer to the data generation mechanism, specifically to the outcome mechanism. The alternative parameters, on the other hand, do not directly refer to the data generation mechanism, but to counterfactual probabilities derived from it. It is believed that humans organize their knowledge in causal models, rather than in by-products thereof [2]. For this reason too, our work is only slightly related to the work by Li et al. [6], which gives 19 rules of the form “if c < 2 ε , then p ( benefit ) ∈ [ p − ε , p + ε ] ,” where c and p are functions of observational and/or counterfactual probabilities and ε is a user-defined parameter. Of the 19 rules, only one involves just observational probabilities (and hence is comparable to our work): if p ( x , y ) + p ( x ′ , y ′ ) < 2 ε , then p ( benefit ) ∈ [ − ε , ε ] . This rule follows trivially from equation (1).

As mentioned above, our sensitivity parameters bound p ( y x ) and p ( y x ′ ) as shown in equations (6) and (7). Alternatively, we could use the more direct bounds m x ≤ p ( y x ) ≤ M x and m x ′ ≤ p ( y x ′ ) ≤ M x ′ . However, these bounds are looser than ours in general. Equation (2) can also be used to bound p ( y x ) and p ( y x ′ ) as p ( x , y ) ≤ p ( y x ) ≤ 1 − p ( x , y ′ ) and p ( x ′ , y ) ≤ p ( y x ′ ) ≤ 1 − p ( x ′ , y ′ ) [3]. However, these bounds are also looser than ours in general. To see it, assume to the contrary that p ( x , y ) + p ( x ′ ) M x > 1 − p ( x , y ′ ) , which implies that M x > 1 , which is a contradiction.

In a study by Peña [7], a method for sensitivity analysis of the ATE under unmeasured confounding is presented. The method has two sensitivity parameters as follows:

m = min x , u p ( y ∣ x , u ) M = max x , u p ( y ∣ x , u ) .

These parameters are not useful for our purpose. Specifically, they produce a non-informative lower bound of p ( benefit ) . To see it, it suffices to replace m x and m x ′ with m and M x and M x ′ with M in equation (8), and then, note that the informative region of the lower bound is p ( y ∣ x ′ ) < m ≤ M < p ( y ∣ x ) , which is empty. However, it should be mentioned that our sensitivity analysis of the ATE in Section 2.1 is a straightforward adaptation of the method in [7] to our sensitivity parameters.

To the best of our knowledge, we are the first to use just a single binary proxy of the unmeasured confounder in order to tighten the bounds of p ( benefit ) and p ( harm ) in terms of just the observed data distribution. Note that our bounds are assumption free. Some of our bounds hold only under certain conditions, but these conditions are testable from the observed data distribution. Our work is closely related to that of Kawakami [8], which shows that p ( benefit ) is identifiable from the observed data distribution if there is an instrumental variable in addition to the proxy. Our work is also related to that of Kuroki and Cai [9], which derives tighter bounds than those in equation (2) by using some covariates S that are not affected by the exposure. The bounds are obtained by applying equation (2) within each stratum of S and, then, averaging these stratified bounds weighted by p ( s ) . Clearly, these bounds reduce to those in equation (1) when using just the observed data distribution. However, it may be worth studying whether it is advantageous to apply this stratification technique to our bounds. Our work is also related that of Shingaki and Kuroki [10], which shows that p ( benefit ) is identifiable from observational and experimental data if there is a proxy of the unmeasured confounder with at least four states, or from just observational data if there are at least two such proxies. Proxies have also been used to identify other counterfactual probabilities than p ( benefit ) , e.g., [11] showed that p ( y x ) is identifiable from just the observed data distribution if there is a proxy V of the unmeasured confounder U and p ( v ∣ u ) is known, or if there are two proxies. Other works such as [2,3] show that p ( benefit ) is identifiable from just observational data under assumptions (i.e., conditions that are untestable) such as p ( harm ) = 0 (a.k.a. monotonicity), or unconfoundness, or knowledge of the functional forms of the causal mechanisms. Our work is also related to [12], which derives tighter bounds than those in equation (2) under some graphical conditions. Of the results in that of Mueller et al. [12], only Theorem 4 applies to our causal graph in Figure 6. Moreover, the bounds in that theorem involve both observational and counterfactual probabilities. When only the terms involving observational probabilities are retained (so that the bounds are comparable to ours), the bounds reduce to those in equation (1). The bounds in the study by Mueller et al. [12] are applied in the study by Li and Pearl [13] to the unit selection problem [14]. It may be interesting, considering our bounds, to address the unit selection problem in terms of just the observed data distribution.

In this work, we were interested in assessing the true benefit and harm of an exposure and, consequently, were focused on bounding the probabilities of benefit and harm. However, our methods can be easily adapted to bound other probabilities of causality, such as the probability of necessity and the probability of sufficiency [2,3]. Specifically, the probability of necessity is defined as p ( y x ′ ′ ∣ x , y ) , i.e., the probability that the event y would not have occurred in the absence of the event x given that both events did in fact occur. It represents the probability that the outcome is attributable to the exposure. The probability of sufficiency is defined as p ( y x ∣ x ′ , y ′ ) , i.e., the probability that the event y would have occurred in the presence of the event x given that both events did in fact not occur. It represents the probability of the exposure to produce the outcome. These two probabilities can be combined into the probability of benefit (a.k.a. the probability of necessity and sufficiency) [3]:

p ( y x , y x ′ ′ ) = p ( x , y ) p ( y x ′ ′ ∣ x , y ) + p ( x ′ , y ′ ) p ( y x ∣ x ′ , y ′ ) .

The probability of necessity and sufficiency are not identifiable in general, but they can be bounded:

max 0 , p ( y ) − p ( y x ′ ) p ( x , y ) ≤ p ( y x ′ ′ ∣ x , y ) ≤ min 1 , p ( y x ′ ′ ) − p ( x ′ , y ′ ) p ( x , y )

and

max 0 , p ( y x ) − p ( y ) p ( x ′ , y ′ ) ≤ p ( y x ∣ x ′ , y ′ ) ≤ min 1 , p ( y x ) − p ( x , y ) p ( x ′ , y ′ ) .

Note that the bounds are non-informative (i.e., they are 0 and 1) if, as we assume in this work, we only have access to the observed data distribution. Our methods can certainly be adapted to tighten the bounds, since they resemble those in equation (2). The adaptation is straightforward.

Finally, it would be worth studying the possibility of extending our bounds beyond binary random variables by making use of the results in [15–17]. It may also be worth extending our sensitivity analysis method to the case where there is a proxy V of the unmeasured confounder U . In that case, two natural sensitivity parameters may be the sensitivity p ( v ∣ u ) and specificity p ( v ′ ∣ u ′ ) of the proxy.

Acknowledgements

We thank the reviewers for their comments, which helped us improve our work. We also thank Manabu Kuroki and Haruka Yoshida for their comments on an earlier version of this manuscript.

Funding information: We gratefully acknowledge financial support from the Swedish Research Council (ref. 2019-00245).
Conflict of interest: Author states no conflict of interest.

Appendix A Derivations of equations (8) and (9)

From equations (6) and (7), we have that

p ( x , y ) + p ( x ′ ) m x ≤ p ( y x )

and

p ( y x ′ ) ≤ p ( x ′ , y ) + p ( x ) M x ′ ,

which imply that

p ( y x ) − p ( y x ′ ) ≥ p ( x , y ) + p ( x ′ ) m x − [ p ( x ′ , y ) + p ( x ) M x ′ ]

and

p ( y ) − p ( y x ′ ) ≥ p ( y ) − [ p ( x ′ , y ) + p ( x ) M x ′ ] = p ( x , y ) + p ( x ′ , y ) − [ p ( x ′ , y ) + p ( x ) M x ′ ]

and

p ( y x ) − p ( y ) ≥ p ( x , y ) + p ( x ′ ) m x − p ( y ) = p ( x , y ) + p ( x ′ ) m x − [ p ( x , y ) + p ( x ′ , y ) ] ,

which together with equation (2) imply equation (8). Likewise, from equations (6) and (7), we have that

p ( y x ) ≤ p ( x , y ) + p ( x ′ ) M x

and

p ( x ′ , y ) + p ( x ) m x ′ ≤ p ( y x ′ )

which imply that

p ( y x ′ ′ ) = 1 − p ( y x ′ ) ≤ 1 − [ p ( x ′ , y ) + p ( x ) m x ′ ]

and

p ( y x ) − p ( y x ′ ) + p ( x , y ′ ) + p ( x ′ , y ) ≤ p ( x , y ) + p ( x ′ ) M x − [ p ( x ′ , y ) + p ( x ) m x ′ ] + p ( x , y ′ ) + p ( x ′ , y ) = p ( x , y ) + p ( x , y ′ ) + p ( x ′ ) M x − p ( x ) m x ′ = p ( x ) + p ( x ′ ) M x − p ( x ) m x ′ ,

which together with equation (2) imply equation (9).

References

[1] Mueller S, Pearl J. Personalized decision-making - a conceptual introduction. 2022. arXiv:220809558 [csAI]. 10.1515/jci-2022-0050Suche in Google Scholar

[2] Pearl J. Causality: models, reasoning, and inference. Cambridge, UK: Cambridge University Press; 2009. 10.1017/CBO9780511803161Suche in Google Scholar

[3] Tian J, Pearl J. Probabilities of causation: bounds and identification. Ann Math Artif Intell. 2000;28:287–313. 10.1023/A:1018912507879Suche in Google Scholar

[4] Ogburn EL, VanderWeele TJ. On the nondifferential misclassification of a binary confounder. Epidemiology. 2012;23:433–9. 10.1097/EDE.0b013e31824d1f63Suche in Google Scholar PubMed PubMed Central

[5] Peña JM. On the monotonicity of a nondifferentially mismeasured binary confounder. J Causal Inference. 2020;8:150–63. 10.1515/jci-2020-0014Suche in Google Scholar

[6] Li A, Mueller S, Pearl J. ε-identifiability of causal quantities. 2023. arXiv:230112022 [csAI]. Suche in Google Scholar

[7] Peña JM. Simple yet sharp sensitivity analysis for unmeasured confounding. J Causal Inference. 2022;10:1–17. 10.1515/jci-2021-0041Suche in Google Scholar

[8] Kawakami Y. Instrumental variable-based identification for causal effects using covariate information. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence; 2021. p. 12131–8. 10.1609/aaai.v35i13.17440Suche in Google Scholar

[9] Kuroki M, Cai Z. Statistical analysis of “probabilities of causation” using co-variate information. Scandinavian J Stat. 2011;38:564–77. 10.1111/j.1467-9469.2011.00730.xSuche in Google Scholar

[10] Shingaki R, Kuroki M. Identification and estimation of joint probabilities of potential outcomes in observational studies with covariate information. In: Advances in neural information processing systems. Vol. 34; 2021. p. 26475–86. Suche in Google Scholar

[11] Kuroki M, Pearl J. Measurement bias and effect restoration in causal inference. Biometrika. 2014;101:423–37. 10.1093/biomet/ast066Suche in Google Scholar

[12] Mueller S, Li A, Pearl J. Causes of effects: learning individual responses from population data. In: Proceedings of the 31st International Joint Conference on Artificial Intelligence; 2022. p. 2712–8. 10.24963/ijcai.2022/376Suche in Google Scholar

[13] Li A, Pearl J. Unit selection with causal diagram. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence; 2022. p. 5765–72. 10.1609/aaai.v36i5.20519Suche in Google Scholar

[14] Li A, Pearl J. Unit selection based on counterfactual logic. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence; 2019. p. 1793–9. 10.24963/ijcai.2019/248Suche in Google Scholar

[15] Li A, Pearl J. Probabilities of causation with nonbinary treatment and effect. 2022. arXiv:220809568 [csAI]. Suche in Google Scholar

[16] Peña JM, Balgi S, Sjölander A, Gabriel EE. On the bias of adjusting for a non-differentially mismeasured discrete confounder. J Causal Inference. 2021;9:229–49. 10.1515/jci-2021-0033Suche in Google Scholar

[17] Sjölander A, Peña JM, Gabriel EE. Bias results for nondifferential mismeasurement of a binary confounder. Stat Probability Letters. 2022;186:109474. 10.1016/j.spl.2022.109474Suche in Google Scholar

Received: 2023-03-11

Revised: 2023-06-05

Accepted: 2023-06-05

Published Online: 2023-08-23

This work is licensed under the Creative Commons Attribution 4.0 International License.

Artikel in diesem Heft

https://doi.org/10.1515/jci-2023-0012

Schlagwörter für diesen Artikel

sensitivity analysis; probability of necessity and sufficiency; unmeasured confounding; proxies

Creative Commons

BY 4.0