On the Monotonicity of a Nondifferentially Mismeasured Binary Confounder

Jose M. Peña

doi:10.1515/jci-2020-0014

Article Open Access

On the Monotonicity of a Nondifferentially Mismeasured Binary Confounder

Jose M. Peña

Published/Copyright: November 28, 2020

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Causal Inference Volume 8 Issue 1

Abstract

Suppose that we are interested in the average causal effect of a binary treatment on an outcome when this relationship is confounded by a binary confounder. Suppose that the confounder is unobserved but a nondifferential proxy of it is observed. We show that, under certain monotonicity assumption that is empirically verifiable, adjusting for the proxy produces a measure of the effect that is between the unadjusted and the true measures.

Keywords: average causal effect; confounding; monotonicity

MSC 2010: 62D20; 62H22

1 Introduction

Suppose that we are interested in the average causal effect of a binary treatment A on an outcome Y when this relationship is confounded by a binary confounder C. Suppose also that C is nondifferentially mismeasured, meaning that (i) C is not observed and, instead, a binary proxy D of C is observed, and (ii) D is conditionally independent of A and Y given C. The causal graph to the left in Figure 1 represents the relationships between the random variables.

Figure 1

Causal graphs, where Y is a discrete or continuous random variable, and A, C and D are binary random variables. Moreover, C is unobserved.

[2] argues that adjusting for D produces a partially adjusted measure of the average causal effect of A on Y that is between the crude (i.e., unadjusted) and true (i.e., adjusted for C) measures. Ogburn and VanderWeele [4, Lemma 1] show that, although this result does not always hold, it does hold under some monotonicity condition in C. Specifically, E [Y|A, C] must be nondecreasing or nonincreasing in C. Since this condition can be interpreted as that the average causal effect of C on Y must be in the same direction among the treated (A = 1) and the untreated (A = 0), [4] argue that the condition is likely to hold in most applications in epidemiology. Unfortunately, the condition cannot be verified empirically because C is unobserved. Therefore, one has to rely on substantive knowledge to verify it. Moreover, the condition is sufficient but not necessary. [5] extend these results to the case where C takes more than two values. If there are at least two independent proxies of C, then [3] show that the causal effect of A on Y can be identified under certain rank condition.

In this paper, we prove that if the monotonicity condition holds in D, then it holds in C as well. Since D is observed, the monotonicity condition in D can be verified empirically. Therefore, if no substantive knowledge is available but data are, then combining our result with Lemma 1 by [4] may allow us to conclude that the partially adjusted effect is between the crude and the true ones and, thus, that the partially adjusted effect is a better approximation to the true effect than the crude one. We also report experiments showing that most random parameterizations of the causal graph to the left in Figure 1 result in a partially adjusted effect that lies between the crude and the true ones, although only half of them satisfy the monotonicity condition in D. This confirms that the condition is sufficient but not necessary. This result should be interpreted with caution because, in fields like epidemiology, one is not typically concerned with a random parameterization but, rather, with one carefully engineered by evolution. We provide a partial answer to this question by characterizing a nonmonotonic case (albeit empirically untestable) where the partially adjusted effect still lies between the crude and the true ones. Finally, we also prove that if the monotonicity condition holds in D, then it also holds in C when D is a driver of C rather than a proxy, i.e. D causes C. We illustrate the relevance of this result with an example on transportability of causal inference across populations.

The rest of the paper is organized as follows. Sections 2 and 3 present our results when D is a proxy and a driver of C, respectively. Section 4 closes with some discussion.

2 On a Proxy of the Confounder

Consider the causal graph to the left in Figure 1, where Y is a discrete or continuous random variable, and A, C and D are binary random variables. The graph entails the following factorization:

(1)p(A,C,D,Y)=p(C)p(D∣C)p(A∣C)p(Y∣A,C).

Let A take values a and ā, and similarly for C and D. Let A, D and Y be observed and let C be unobserved. Let Ya and Ya¯denote the counterfactual outcomes under treatments A=a and A=a¯,respectively. The average causal effect of A on Y or true risk difference (RD_true) is defined as RDtrue=E[Ya]−E[Ya¯].It can be rewritten as follows [6, Theorem 3.3.2]:

RDtrue=E[Y∣a,c]p(c)+E[Y∣a,c¯]p(c¯)−E[Y∣a¯,c]p(c)−E[Y∣a¯,c¯]p(c¯).

Since C is unobserved, RD_true cannot be computed. It can be approximated by the unadjusted average causal effect or crude risk difference (RD_crude):

RDcrude=E[Y∣a]−E[Y∣a¯]

and by the partially adjusted average causal effect or observed risk difference (RD_obs):

RDobs=E[Y∣a,d]p(d)+E[Y∣a,d¯]p(d¯)−E[Y∣a¯,d]p(d)−E[Y∣a¯,d¯]p(d¯).

We say that E [Y|A, D] is nondecreasing in D if

(2)E[Y∣a,d]≥E[Y∣a,d¯] and E[Y∣a¯,d]≥E[Y∣a¯,d¯].

Likewise, E [Y|A, D] is nonincreasing in D if

(3)E[Y∣a,d]≤E[Y∣a,d¯] and E[Y∣a¯,d]≤E[Y∣a¯,d¯].

Moreover, E [Y|A, D] is monotone in D if it is nondecreasing or nonincreasing in D. Ogburn and VanderWeele [4, Lemma 1] show that if E [Y|A, C] is monotone in C, then E [Y|A, D] is monotone in D. The following theorem proves the converse result. The relevance of this result is as follows. Ogburn and VanderWeele [4, Result 1] show that if E [Y|A, C] is monotone in C, then RD_obs lies between RD_true and RD_crude. The antecedent of this rule cannot be verified empirically, because C is unobserved. Therefore, one must rely on substantive knowledge to apply the rule. The following theorem implies that, luckily, the rule also holds for D and, thus, that the antecedent can be verified empirically.

Theorem 1

Consider the causal graph to the left in Figure 1. If E[Y|A, D] is monotone in D, then E[Y|A, C] is monotone in C.

Proof. Assume to the contrary that E[Y|A, C] is not monotone in C, i.e.

(4)E[Y∣a,c]≤E[Y∣a,c¯] and E[Y∣a¯,c]≥E[Y∣a¯,c¯]

(5)E[Y∣a,c]≥E[Y∣a,c¯] and E[Y∣a¯,c]≤E[Y∣a¯,c¯].

This gives four cases to consider: Whether Equation 2 or 3 holds, and whether Equation 4 or 5 holds. Hereinafter, we focus on the first case. The other cases are similar.

Assume that Equations 2 and 4 hold. We show next that the first inequalities in Equations 2 and 4 imply that p(c∣a,d)≤p(c∣a,d¯).Specifically,

E[Y∣a,d]≥E[Y∣a,d¯]E[Y∣a,d,c]p(c∣a,d)+E[Y∣a,d,c¯]p(c¯∣a,d)≥E[Y∣a,d¯,c]p(c∣a,d¯)+E[Y∣a,d¯,c¯]p(c¯∣a,d¯)E[Y∣a,c]p(c∣a,d)+E[Y∣a,c¯]p(c¯∣a,d)≥E[Y∣a,c]p(c∣a,d¯)+E[Y∣a,c¯]p(c¯∣a,d¯)

because Y is conditionally independent of D given A and C due to the causal graph under consideration and, thus,

E[Y∣a,c]p(c∣a,d)+E[Y∣a,c¯](1−p(c∣a,d))≥E[Y∣a,c]p(c∣a,d¯)+E[Y∣a,c¯](1−p(c∣a,d¯))(E[Y∣a,c]−E[Y∣a,c¯])p(c∣a,d)≥(E[Y∣a,c]−E[Y∣a,c¯])p(c∣a,d¯)p(c∣a,d)≤p(c∣a,d¯)

because E[Y∣a,c]≤E[Y∣a,c¯]by Equation 4.

Furthermore,

p(c∣a,d)=p(a,d∣c)p(c)p(a,d∣c)p(c)+p(a,d∣c¯)p(c¯)=11+exp(−δ(a,d))=σ(δ(a,d))

where

δ(a,d)=lnp(a,d∣c)p(c)p(a,d∣c¯)p(c¯)

is known as the log odds, and σ() is known as the logistic sigmoid function [1, Section 4.2]. Note that σ() is an increasing function. Then,

p(c∣a,d)≤p(c∣a,d¯)δ(a,d)≤δ(a,d¯)ln⁡p(a∣c)+ln⁡p(d∣c)+ln⁡p(c)−ln⁡p(a∣c¯)−ln⁡p(d∣c¯)−ln⁡p(c¯)≤ln⁡p(a∣c)+ln⁡p(d¯∣c)+ln⁡p(c)−ln⁡p(a∣c¯)−ln⁡p(d¯∣c¯)−ln⁡p(c¯)

because A is conditionally independent of D given C due to the causal graph under consideration and, thus,

(6)ln⁡p(d∣c)−ln⁡p(d∣c¯)≤ln⁡p(d¯∣c)−ln⁡p(d¯∣c¯)ln⁡p(d∣c)p(d∣c¯)≤ln⁡p(d¯∣c)p(d¯∣c¯)p(d∣c)p(d∣c¯)≤p(d¯∣c)p(d¯∣c¯).

Likewise, the second inequalities in Equations 2 and 4 imply that p(c∣a¯,d)≥p(c∣a¯,d¯),which implies that

p(d∣c)p(d∣c¯)≥p(d¯∣c)p(d¯∣c¯)

which contradicts Equation 6 unless equality holds. However, equality only occurs if p(d∣c)=p(d∣c¯),which implies that C and D are independent and, thus, that D is not a mismeasured confounder. ◻

Corollary 2

Consider the causal graph to the left in Figure 1. If E[Y|A, D] is monotone in D, then RD_obs lies between RD_true and RD_crude.

Proof. The result follows directly from Theorem 1 and Ogburn and VanderWeele [4, Result 1]. ◻

2.1 Experiments

In this section, we report some experiments that shed additional light on the relationships between the various risk differences. Specifically, we randomly parameterized 10000 times the causal graph to the left in Figure 1 by parameterizing the terms in the right-hand side of Equation 1 with parameter values drawn from a uniform distribution. ^[1]For each parameterization, we then computed RD_true, RD_obs and RD_crude. The results are reported in Table 1. Of the 10000 runs, 4891 were monotone in C and also in D, as expected from Ogburn and VanderWeele [4, Lemma 1]. There were no other runs that were monotone in D, as expected from Theorem 1. In all these 4891 runs, RD_obs was between RD_true and RD_crude, as expected from Corollary 2 and Ogburn and VanderWeele [4, Result 1]. It is also worth noticing from the table that the 10000 runs are rather evenly distributed among the different entries. Finally, 4460 of the 5109 runs where the monotonicity assumption did not hold still resulted in that RD_obs was between RD_true and RD_crude. In other words, although half of the runs violated the monotonicity assumption, few of them resulted in RD_obs being outside the range of RD_true and RD_crude. In total, RD_obs was between RD_true and RD_crude in 94 % of the runs. Therefore, RD_obs was a better approximation to RD_true than RD_crude in most of the runs. We investigate further this question in the next section, where we characterize a nonmonotonic case where RD_obs still lies between RD_true and RD_crude.

Table 1

Results of 10000 random parameterizations of the causal graph to the left in Figure 1.

In-between		Nondec. in D	Noninc. in D	Neither
2430	Nondec. in C	1175	1255	0
2461	Noninc. in C	1225	1236	0
4460	Neither	0	0	5109

The plots in Figure 2 show some additional descriptive statistics for the runs where RD_obs belonged to the interval between RD_true and RD_crude. The top left plot shows that most intervals were quite small and, thus, that RD_obs was a good approximation to RD_true in most cases. However, the top right plot shows that RD_obs was typically closer to RD_crude than to RD_true. The bottom left plot is a zoom of the previous plot at the smallest intervals. Finally, the bottom right plot shows that the lower the correlation between C and D when measured by the Youden index, the closer RD_obs was to RD_crude. In summary, RD_obs seems to be a good approximation to RD_true, but it seems to be biased towards RD_crude. This is a problem when the interval between RD_crude and RD_true is large. However, the length of the interval is unknown in practice, and we doubt substantive knowledge may provide hints on it. The bias seems to decrease with increasing correlation between C and D. Although this correlation is unknown in practice, substantive knowledge may give hints on it.

Figure 2

(tl) Histogram of interval length. (tr) Distance between RD_obs and RD_true relative to interval length. (bl) Zoom of previous plot. (br) Distance between RD_obs and RD_true relative to interval length, as a function of correlation between C and D when measured by Youden index.

2.2 Nonmonotonicity

Consider the causal graph to the left in Figure 1. This section characterizes a case where E [Y|A, C] is not monotone in C and, thus, E [Y|A, D] is not monotone in D by Theorem 1, and yet RD_obs lies between RD_true and RD_crude. Specifically, let A, D and Y represent three diseases, and C a risk factor for the three of them. Suppose that suffering A affects the risk of suffering Y. Suppose that half of the population is exposed to the risk factor C, i.e. p(c) = 0.5. Suppose also that the exposure to C affects the risk of suffering A and D as p(a|c) = p(a¯∣c¯)=p(d∣c)=p(d¯∣c¯)≥0.5.Finally, suppose that E[Y∣a,c]−E[Y∣a,c¯]≥0andE[Y∣a¯,c¯]−E[Y∣a¯,c]≥0.In other words, the exposure to C increases the average severity of the disease Y for the individuals suffering the disease A, while it decreases the severity for the rest. Therefore, the monotonicity assumption does hold. However, under the additional assumption that E[Y∣a,c]−E[Y∣a,c¯]≥E[Y∣a¯,c¯]−E[Y∣a¯,c],we can still conclude that RD_obs lies between RD_true and RD_crude. Note that one has to rely on substantive knowledge to verify the conditions in the characterization, because C is unobserved. The following theorems formalize this result.

Theorem 3

Consider the causal graph to the left in Figure 1. Let p(c) = 0.5 andp(a∣c)=p(a¯∣c¯)=p(d∣c)=p(d¯∣c¯)≥0.5. If E[Y∣a,c]−E[Y∣a,c¯]≥E[Y∣a¯,c¯]−E[Y∣a¯,c]≥0, then RDcrude≥RDobs≥RDtrue.

Proof. We start by proving some auxiliary facts.

Fact 1. Recall from the proof of Theorem 1 that

p(c∣a,d)=σ(lnp(a,d∣c)p(c)p(a,d∣c¯)p(c¯))=σ(lnp(a∣c)p(d∣c)p(a∣c¯)p(d∣c¯))

where the last equality follows from the assumption that p(c) = 0.5, and the fact that A and D are conditionally independent given C due to the causal graph under consideration. Note that the previous equation implies that p(c∣a,d)=p(c¯∣a¯,d¯)≥0.5and p(c∣a¯,d)=p(c∣a,d¯)=0.5,due to the assumption that p(a∣c)=p(a¯∣c¯)=p(d∣c)=p(d¯∣c¯)≥0.5.

Fact 2. Note that

E[Y∣a,d]−E[Y∣a,d¯]=E[Y∣a,c,d]p(c∣a,d)+E[Y∣a,c¯,d]p(c¯∣a,d)−E[Y∣a,c,d¯]p(c∣a,d¯)−E[Y∣a,c¯,d¯]p(c¯∣a,d¯)=E[Y∣a,c](p(c∣a,d)−0.5)+E[Y∣a,c¯](p(c¯∣a,d)−0.5)=E[Y∣a,c](p(c∣a,d)−0.5)−E[Y∣a,c¯](p(c∣a,d)−0.5)

where the second equality follows from the fact that Y and D are conditionally independent given A and C due to the causal graph under consideration, and the fact that p(c∣a¯,d)=p(c∣a,d¯)=0.5by Fact 1. Likewise,

E[Y∣a¯,d¯]−E[Y∣a¯,d]=E[Y∣a¯,c,d¯]p(c∣a¯,d¯)+E[Y∣a¯,c¯,d¯]p(c¯∣a¯,d¯)−E[Y∣a¯,c,d]p(c∣a¯,d)−E[Y∣a¯,c¯,d]p(c¯∣a¯,d)=E[Y∣a¯,c](p(c∣a¯,d¯)−0.5)+E[Y∣a¯,c¯](p(c¯∣a¯,d¯)−0.5)=−E[Y∣a¯,c](p(c∣a,d)−0.5)+E[Y∣a¯,c¯](p(c∣a,d)−0.5)

since p(c∣a,d)=p(c¯∣a¯,d¯)by Fact 1. Then, the assumption that E[Y∣a,c]−E[Y∣a,c¯]≥E[Y∣a¯,c¯]−E[Y∣a¯,c]≥0together with the fact that p(c|a, d) ≥ 0.5 by Fact 1 imply that E[Y∣a,d]−E[Y∣a,d¯]≥E[Y∣a¯,d¯]−E[Y∣a¯,d]≥0.

Fact 3. Note that

p(d)=p(d∣c)p(c)+p(d∣c¯)p(c¯)=p(d¯∣c¯)p(c¯)+p(d¯∣c)p(c)=p(d¯)

by the assumptions that p(c) = 0.5 and p(d∣c)=p(d¯∣c¯).Then, p(d) = 0.5 and, thus, p(c|d) = p(d|c) = p(d¯∣c¯)=p(c¯∣d¯).Likewise, p(a) = 0.5 and p(d∣a)=p(a∣d)=p(a¯∣d¯)=p(d¯∣a¯).Note also that

p(a¯∣d)=p(a¯∣c,d)p(c∣d)+p(a¯∣c¯,d)p(c¯∣d)=p(a¯∣c)p(c∣d)+p(a∣c)p(c¯∣d)

since A and D are conditionally independent given C due to the causal graph under consideration, and p(a∣c)=p(a¯∣c¯)by assumption. Moreover, the previous equation can be rewritten as

p(a¯∣d)=2p(a∣c)(1−p(a∣c))

because p(d|c) = p(c|d) as shown above, whereas p(a|c) = p(d|c) by assumption. The last equation implies that p(a¯∣d)>0.5.To see it, rewrite the last equation as the function f(x) = 2x(1 − x). By inspecting the first and second derivatives, we can conclude that f(x) has a single maximum at x = 0.5 with value 0.5. That p(a¯∣d)>0.5implies that p(d|a) = p(a|d) ≥ 0.5.

We now prove the theorem. Note that the assumption that p(c) = 0.5 implies that

(7)RDtrue=(E[Y∣a,c]+E[Y∣a,c¯]−E[Y∣a¯,c]−E[Y∣a¯,c¯])/2.

Note also that p(d) = 0.5 by Fact 3 and, thus,

(8)RDobs=(E[Y∣a,d]+E[Y∣a,d¯]−E[Y∣a¯,d]−E[Y∣a¯,d¯])/2=(E[Y∣a,c,d]p(c∣a,d)+E[Y∣a,c¯,d]p(c¯∣a,d)+E[Y∣a,c,d¯]p(c∣a,d¯)+E[Y∣a,c¯,d¯]p(c¯∣a,d¯)−E[Y∣a¯,c,d]p(c∣a¯,d)−E[Y∣a¯,c¯,d]p(c¯∣a¯,d)−E[Y∣a¯,c,d¯]p(c∣a¯,d¯)−E[Y∣a¯,c¯,d¯]p(c¯∣a¯,d¯))/2=(E[Y∣a,c](p(c∣a,d)+p(c∣a,d¯))+E[Y∣a,c¯](p(c¯∣a,d)+p(c¯∣a,d¯))−E[Y∣a¯,c](p(c∣a¯,d)+p(c∣a¯,d¯))−E[Y∣a¯,c¯](p(c¯∣a¯,d)+p(c¯∣a¯,d¯)))/2

since Y and D are conditionally independent given A and C due to the causal graph under consideration. Note that p(c∣a,d)=p(c¯∣a¯,d¯) and p(c∣a¯,d)=p(c∣a,d¯)=0.5by Fact 1. Then, the previous equation can be rewritten as follows with α = p(c|a, d) − 0.5:

RDobs=(E[Y∣a,c](p(c∣a,d)+0.5)+E[Y∣a,c¯](p(c¯∣a,d)+0.5)−E[Y∣a¯,c](0.5+p(c¯∣a,d))−E[Y∣a¯,c¯](0.5+p(c∣a,d)))/2=(E[Y∣a,c](1+α)+E[Y∣a,c¯](1−α)−E[Y∣a¯,c](1−α)−E[Y∣a¯,c¯](1+α))/2=(E[Y∣a,c]+E[Y∣a,c¯]−E[Y∣a¯,c]−E[Y∣a¯,c¯])/2+(E[Y∣a,c]α−E[Y∣a,c¯]α+E[Y∣a¯,c]α−E[Y∣a¯,c¯]α)/2≥RDtrue

by Equation 7 and the fact that the term in penultimate line above is nonnegative. The latter follows from the assumption that E[Y∣a,c]−E[Y∣a,c¯]≥E[Y∣a¯,c¯]−E[Y∣a¯,c]≥0,and the fact that α = p(c|a, d) − 0.5 ≥ 0 by Fact 1.

Having proven that RD_obs ≥ RD_true, it only remains to prove that RD_crude ≥ RD_obs. Let β = p(d|a) − 0.5. Note that p(d∣a)=p(d¯∣a¯)by Fact 3. Then,

RDcrude=E[Y∣a]−E[Y∣a¯]=E[Y∣a,d]p(d∣a)+E[Y∣a,d¯]p(d¯∣a)−E[Y∣a¯,d]p(d∣a¯)−E[Y∣a¯,d¯]p(d¯∣a¯)=E[Y∣a,d](0.5+β)+E[Y∣a,d¯](0.5−β)−E[Y∣a¯,d](0.5−β)−E[Y∣a¯,d¯](0.5+β)=E[Y∣a,d]0.5+E[Y∣a,d¯]0.5−E[Y∣a¯,d]0.5−E[Y∣a¯,d¯]0.5+E[Y∣a,d]β−E[Y∣a,d¯]β+E[Y∣a¯,d]β−E[Y∣a¯,d¯]β≥RDobs

by Equation 8 and the fact that the term in the penultimate line above is nonnegative. The latter follows from the fact that E[Y∣a,d]−E[Y∣a,d¯]≥E[Y∣a¯,d¯]−E[Y∣a¯,d]≥0by Fact 2, and the fact that β = p(d|a) − 0.5 ≥ 0 by Fact 3. ◻

Theorem 4

Consider the causal graph to the left in Figure 1. Let p(c) = 0.5 andp(a∣c)=p(a¯∣c¯)=p(d∣c)=p(d¯∣c¯)≥0.5. If E[Y∣a,c]−E[Y∣a,c¯]≤E[Y∣a¯,c¯]−E[Y∣a¯,c]≤0, then RDcrude≤RDobs≤RDtrue.

Proof. The proof is analogous to that of the previous theorem. ◻

Consider now replacing the assumption that p(a∣c)=p(a¯∣c¯)≥0.5in Theorem 3 by the weaker assumption that p(a¯∣c¯)≥p(a∣c)≥0.5.Then, RD_crude ≥ RD_obs does not always hold. Our experiments suggest that it holds for approximately 90 % of the parameterizations. ^[2]However, RD_crude ≥ RD_true and RD_obs ≥ RD_true always hold, as the following theorem proves. This result is useful when, for instance, 0 > RD_crude or 0 > RD_obs because, then, one can readily conclude that 0 > RD_obs, i.e. suffering the disease A reduces the average severity of the disease Y.

Theorem 5

Consider the causal graph to the left in Figure 1. Letp(c)=0.5,p(d∣c)=p(d¯∣c¯)≥0.5 and p(a¯∣c¯)≥p(a∣c)≥0.5. If E[Y∣a,c]−E[Y∣a,c¯]≥E[Y∣a¯,c¯]−E[Y∣a¯,c]≥0,then RD_crude ≥ RD_true and RD_obs ≥ RD_true.

Proof. We start by proving that RD_crude ≥ RD_true. Recall from the proof of Theorem 1 that

p(c∣a)=σ(lnp(a∣c)p(c)p(a∣c¯)p(c¯))=σ(lnp(a∣c)p(a∣c¯))

where the second equality follows from the assumption that p(c) = 0.5. Likewise,

p(c¯∣a¯)=σ(lnp(a¯∣c¯)p(a¯∣c)).

Note also that p(c|a) ≥ 0.5 and p(c¯∣a¯)≥0.5due to the assumption that p(a¯∣c¯)≥p(a∣c)≥0.5.Now, consider the function f(x) = x(1 − x). By inspecting the first and second derivatives, we can conclude that f(x) has a single maximum at x = 0.5, and that it is increasing in the interval [0, 0.5] and decreasing in the interval [0.5, 1]. This implies that p(a∣c)p(a¯∣c)≥p(a∣c¯)p(a¯∣c¯)due to the assumption that p(a¯∣c¯)≥p(a∣c)≥0.5.Then,

(9)p(a∣c)p(a∣c¯)≥p(a¯∣c¯)p(a¯∣c)

which together with the fact that σ() and ln() are increasing functions imply that p(c∣a)≥p(c¯∣a¯).

The results in the previous paragraph allow us to write p(c|a) = 0.5+α and p(c¯∣a¯)=0.5+βwith α ≥ β ≥ 0. Therefore,

RDcrude=E[Y∣a]−E[Y∣a¯]=E[Y∣a,c]p(c∣a)+E[Y∣a,c¯]p(c¯∣a)−E[Y∣a¯,c]p(c∣a¯)−E[Y∣a¯,c¯]p(c¯∣a¯)=E[Y∣a,c](0.5+α)+E[Y∣a,c¯](0.5−α)−E[Y∣a¯,c](0.5−β)−E[Y∣a¯,c¯](0.5+β)=RDtrue+α(E[Y∣a,c]−E[Y∣a,c¯])−β(E[Y∣a¯,c¯]−E[Y∣a¯,c])≥RDtrue

because α ≥ β ≥ 0 as shown above, E[Y∣a,c]−E[Y∣a,c¯]≥E[Y∣a¯,c¯]−E[Y∣a¯,c]≥0by assumption, and

(10)RDtrue=E[Y∣a,c]0.5+E[Y∣a,c¯]0.5−E[Y∣a¯,c]0.5−E[Y∣a¯,c¯]0.5.

due to the assumption that p(c) = 0.5.

We continue by proving that RD_obs ≥ RD_true. First, recall again from the proof of Theorem 1 that

p(c∣a,d)=σ(lnp(a,d∣c)p(c)p(a,d∣c¯)p(c¯))=σ(lnp(a∣c)p(d∣c)p(a∣c¯)p(d∣c¯))

where the second equality follows from the assumption that p(c) = 0.5, and the fact that A and D are conditionally independent given C due to the causal graph under consideration. Likewise,

p(c¯∣a,d¯)=σ(lnp(a∣c¯)p(d¯∣c¯)p(a∣c)p(d¯∣c)).

Therefore, p(c∣a,d)≥p(c¯∣a,d¯)because σ() and ln() are increasing functions and

p(d∣c)p(d∣c¯)=p(d¯∣c¯)p(d¯∣c)

by the assumption that p(d∣c)=p(d¯∣c¯),and

p(a∣c)p(a∣c¯)≥p(a∣c¯)p(a∣c)

by the assumption that p(a¯∣c¯)≥p(a∣c)≥0.5.We can analogously prove that p(c∣a,d¯)≥p(c¯∣a,d).Therefore,

p(c∣a,d)p(d)+p(c∣a,d¯)p(d¯)≥p(c¯∣a,d)p(d)+p(c¯∣a,d¯)p(d¯)

because

p(d)=p(d∣c)p(c)+p(d∣c¯)p(c¯)=p(d∣c)p(c)+p(d¯∣c)p(c¯)=0.5

by the assumptions that p(c)=0.5 and p(d∣c)=p(d¯∣c¯).Now, note that

p(c¯∣a,d)p(d)+p(c¯∣a,d¯)p(d¯)=1−(p(c∣a,d)p(d)+p(c∣a,d¯)p(d¯))

which implies that p(c∣a,d)p(d)+p(c∣a,d¯)p(d¯)≥0.5.We can analogously prove that p(c¯∣a¯,d)p(d)+p(c¯∣a¯,d¯)p(d¯)≥0.5.

Then, consider the expression

p(c¯∣a¯,d¯)=σ(lnp(a¯∣c¯)p(d¯∣c¯)p(a¯∣c)p(d¯∣c)).

Therefore, p(c∣a,d)≥p(c¯∣a¯,d¯)due to the following three observations. First,

p(d∣c)p(d∣c¯)=p(d¯∣c¯)p(d¯∣c)

by the assumption that p(d∣c)=p(d¯∣c¯).Second,

p(a∣c)p(a∣c¯)≥p(a¯∣c¯)p(a¯∣c)

as shown in Equation 9. Third, σ() and ln() are increasing functions. We can analogously prove that p(c∣a,d¯)≥p(c¯∣a¯,d).Therefore,

p(c∣a,d)p(d)+p(c∣a,d¯)p(d¯)≥p(c¯∣a¯,d)p(d)+p(c¯∣a¯,d¯)p(d¯)

because p(d) = 0.5 as shown above.

Finally, the results in the previous paragraphs allow us to write p(c∣a,d)p(d)+p(c∣a,d¯)p(d¯)=0.5+αand p(c¯∣a¯,d)p(d)+p(c¯∣a¯,d¯)p(d¯)=0.5+βwith α ≥ β ≥ 0. Consequently, p(c¯∣a,d)p(d)+p(c¯∣a,d¯)p(d¯)=0.5−α,and p(c∣a¯,d)p(d)+p(c∣a¯,d¯)p(d¯)=0.5−β.Therefore,

RDobs=E[Y∣a,d]p(d)+E[Y∣a,d¯]p(d¯)−E[Y∣a¯,d]p(d)−E[Y∣a¯,d¯]p(d¯)=(E[Y∣a,c,d]p(c∣a,d)+E[Y∣a,c¯,d]p(c¯∣a,d))p(d)+(E[Y∣a,c,d¯]p(c∣a,d¯)+E[Y∣a,c¯,d¯]p(c¯∣a,d¯))p(d¯)−(E[Y∣a¯,c,d]p(c∣a¯,d)+E[Y∣a¯,c¯,d]p(c¯∣a¯,d))p(d)−(E[Y∣a¯,c,d¯]p(c∣a¯,d¯)+E[Y∣a¯,c¯,d¯]p(c¯∣a¯,d¯))p(d¯)=E[Y∣a,c](p(c∣a,d)p(d)+p(c∣a,d¯)p(d¯))+E[Y∣a,c¯](p(c¯∣a,d)p(d)+p(c¯∣a,d¯)p(d¯))−E[Y∣a¯,c](p(c∣a¯,d)p(d)+p(c∣a¯,d¯)p(d¯))−E[Y∣a¯,c¯](p(c¯∣a¯,d)p(d)+p(c¯∣a¯,d¯)p(d¯))=E[Y∣a,c](0.5+α)+E[Y∣a,c¯](0.5−α)−E[Y∣a¯,c](0.5−β)−E[Y∣a¯,c¯](0.5+β)

where the third equality follows from the fact that Y and D are conditionally independent given A and C due to the causal graph under consideration. Then,

RDobs=RDtrue+α(E[Y∣a,c]−E[Y∣a,c¯])−β(E[Y∣a¯,c¯]−E[Y∣a¯,c]) ≥RDtrue

by Equation 10, the above shown fact that α ≥ β ≥ 0, and the assumption that E[Y∣a,c]−E[Y∣a,c¯]≥E[Y∣a¯,c¯]−E[Y∣a¯,c]≥0.◻

One can analogously prove the following result.

Theorem 6

Consider the causal graph to the left in Figure 1. Letp(c)=0.5,p(d∣c)=p(d¯∣c¯)≥0.5 and p(a¯∣c¯)≥p(a∣c)≥0.5. If E[Y∣a,c]−E[Y∣a,c¯]≤E[Y∣a¯,c¯]−E[Y∣a¯,c]≤0,then RD_crude ≤ RD_true and RD_obs ≤ RD_true.

3 On a Driver of the Confounder

Consider the causal graph to the right in Figure 1. Note that D is now a driver rather than a proxy of C, i.e. D causes C. The graph entails the following factorization:

(11)p(A,C,D,Y)=p(D)p(C∣D)p(A∣C)p(Y∣A,C).

We show next that our previous results also apply to the new causal graph under consideration.

Theorem 7

Consider the causal graph to the right in Figure 1. If E [Y|A, D] is monotone in D, then E [Y|A, C] is monotone in C.

Proof. The proof of Theorem 1 also applies when D is a driver of C. ◻

Corollary 8

Consider the causal graph to the right in Figure 1. If E [Y|A, D] is monotone in D, then RD_obs lies between RD_true and RD_crude.

Proof. Note that every probability distribution that is representable by the causal graph to the right in Figure 1 can be represented by the causal graph to the left in Figure 1: Simply, let p _L(A|C) = p _R(A|C) and p _L(Y|A, C) = p_R(Y|A, C) where the subscript L or R indicates whether we refer to Equation 1 or 11, respectively. Moreover, let

pL(C)=pR(C)=pR(C∣d)pR(d)+pR(C∣d¯)pR(d¯)

and

pL(D∣C)=pR(D∣C)=pR(C∣D)pR(D)pR(C∣d)pR(d)+pR(C∣d¯)pR(d¯).

Therefore, RD_crude, RD_obs and RD_true are the same whether they are computed from the graph to the right or to the left in Figure 1. Likewise, if E [Y|A, D] is monotone in D for the graph to the right in Figure 1, then it is also monotone in D for the graph to the left, which implies that RD_obs lies between RD_true and RD_crude by Corollary 2. ◻

VanderWeele et al. [8, Result 1] prove that (i) if E[Y|A, C] and E [A|C] are both nondecreasing or both non-increasing in C, then RD_obs ≥ RD_true, and (ii) if E[Y|A, C] and E [A|C] are one nondecreasing and the other nonincreasing in C, then RD_obs ≤ RD_true. The antecedents of these rules cannot be verified empirically, because C is unobserved. Therefore, one must rely on substantive knowledge to apply the rules. Luckily, the rules also hold for D and, thus, the antecedents can be verified empirically. The following theorem proves it.

Theorem 9

Consider the causal graph to the right in Figure 1. If E [Y|A, D] and E [A|D] are both nondecreasing or both nonincreasing in D, then E [Y|A, C] and E[A|C] are both nondecreasing or both nonincreasing in C. If E[Y|A, D] and E[A|D] are one nondecreasing and the other nonincreasing in D, then E[Y|A, C] and E[A|C] are one nondecreasing and the other nonincreasing in C.

Proof. We prove the result when E [Y|A, D] and E[A|D] are both nondecreasing in D. The proofs for the rest of the cases are similar. Then, we have that (i) E[Y∣a,d]≥E[Y∣a,d¯],and (ii) E[Y∣d]≥E[Y∣d¯].Assume to the contrary that (iii) E[Y∣a,c]≤E[Y∣a,c¯],and (iv) E[Y∣c]≥E[Y∣c¯].As shown in the proof of Theorem 1, (i) and (iii) imply that p(c∣a,d)≤p(c∣a,d¯),which implies that

p(d∣c)p(d∣c¯)≤p(d¯∣c)p(d¯∣c¯).

Likewise, (ii) and (iv) imply that p(c∣d)≥p(c∣d¯),which implies that

p(d∣c)p(d∣c¯)≥p(d¯∣c)p(d¯∣c¯).

As shown in the proof of Theorem 1, this contradicts the fact that C and D are dependent. Therefore, either the assumption (iii) or (iv) or both are false. In the latter case, we get a similar contradiction. So, either the assumption (iii) or (iv) is false. We reach a similar contradiction if replace a with a in the assumptions (i) and (iii). This together with the fact that E [Y|A, C] and E [A|C] are both monotone in C by Theorem 1 prove the result. ◻

Corollary 10

Consider the causal graph to the right in Figure 1. If E [Y|A, D] and E [A|D] are both nondecreasing or both nonincreasing in D, then RD_obs ≥ RD_true. If E[Y|A, D] and E[A|D] are one nondecreasing and the other nonincreasing in D, then RD_obs ≤ RD_true.

Proof. The result follows directly from Theorem 9 and VanderWeele et al. [8, Result 1]. ◻

For completeness, we show below that the converse of Theorem 9 also holds.

Theorem 11

Consider the causal graph to the right in Figure 1. If E [Y|A, C] and E [A|C] are both nondecreasing or both nonincreasing in C, then E [Y|A, D] and E [A|D] are both nondecreasing or both nonincreasing in D. If E[Y|A, C] and E[A|C] are one nondecreasing and the other nonincreasing in C, then E[Y|A, D] and E[A|D] are one nondecreasing and the other nonincreasing in D.

Proof. As shown in the proof of Corollary 8, every probability distribution that is representable by the causal graph to the right in Figure 1 can be represented by the causal graph to the left in Figure 1. Therefore, if E [Y|A, C] and E [A|C] are monotone in C for the right graph, then they are so for the left graph as well. Then, E [Y|A, D] and E [A|D] are monotone in D for the left graph [4, Lemma 1] and, thus, they are so for the right graph as well. The result follows now from the contrapositive formulation of Theorem 9. ◻

Given a sufficiently large sample from p(A, D, Y), we may conclude from it that E [Y|A, D] is monotone in D, which implies that RD_obs lies between RD_true and RD_crude by Corollary 8. We can also estimate RD_obs and RD_crude from the sample, which implies that (i) if RD_crude ≤ RD_obs then RD_obs ≤ RD_true, and (ii) if RD_crude ≥ RD_obs then RD_obs ≥ RD_true. Consequently, Corollary 10 is superfluous when data over (A, D, Y) are available. The following example illustrates that the corollary may be useful when no such data are available.

Example 12

Let A and Y represent a treatment and a disease, respectively. Let D and C represent pre-treatment covariates such as socio-economic and health status, respectively. Say that we have a sample from p₁(A, D, Y) and a sample from p₂(A, D, Y), i.e. we have two samples from two different populations. We are interested in drawing conclusions about RD_true for a third population, from which we have no data. We make the following assumptions:

p₁(D) ≠ p₃(D) ≠ p₂(D) because the socio-economic profile of the third population differs from the other populations’ profiles.
p₁(C|D) = p₂(C|D) = p₃(C|D) because this distribution represents psychological and physiological processes shared by the three populations.
p₁(Y|A, C) = p₃(Y|A, C) ≠ p₂(Y|A, C) because these distributions represent psychological and physiological processes shared by the first and third populations but not by the second. Then, E₃[Y|A, D] = E₁[Y|A, D] which can be estimated from the sample from p₁(A, D, Y).
p₁(A|C) ≠p₂(A|C) = p₃(A|C) because the second and third populations share the same treatment policy but the first does not. Then, E₃[A|D] = E₂[A|D] which can be estimated from the sample from p₂(A, D, Y).

Then, we cannot estimate RD_crude for the third population and, thus, we cannot use Corollary 8 as we did before to bound RD_true. Corollary 10 may, on the other hand, be useful in drawing conclusions. For instance, assume that E₃[Y|A, D] and E₃[A|D] are both nondecreasing or both nonincreasing in D. Then, RD_obs ≥ RD_true by the corollary. If we are interested in testing whether k ≥ RD_true for a given constant k, then it may be worth assuming the cost of collecting data from the third population in order to compute RD_obs, in the hope that k ≥ RD_obs which confirms the hypothesis. If we are interested in testing whether RD_true ≥ k, then we may also be willing to assume the cost, in the hope that k ≥ RD_obs which allows us to reject the hypothesis. In the latter case, we may instead decide to not assume the cost because we can never confirm the hypothesis. Such a seemingly negative result may save us time and money. Similar conclusions can be drawn when E [Y|A, D] and E [A|D] are one nondecreasing and the other nonincreasing in D. On the other hand, no such conclusions can be drawn from Corollary 8 before collecting data.

3.1 Bounds

Causal effects are typically defined in terms of distributions of counterfactuals. For instance, the causal effect on Y of an intervention setting A = a is defined as E [Y_a]. It can be rewritten as follows [6, Theorem 3.3.2]:

E[Ya]=E[Y∣a,c]p(c)+E[Y∣a,c¯]p(c¯).

Since C is unobserved, this effect cannot be computed. It can be approximated by the following quantity:

Sa=E[Y∣a,d]p(d)+E[Y∣a,d¯]p(d¯).

It can also be approximated by S_a = E [Y|a]. Likewise for the causal effect on Y of an intervention setting A=a¯.The following discussion applies to both approximations.

VanderWeele et al. [8, Result 1] prove that (i) if E [Y|A, C] and E [A|C] are both nondecreasing or both nonincreasing in C, then Sa≥E[Ya] and Sa¯≤E[Ya¯],and (ii) if E [Y|A, C] and E [A|C] are one nondecreasing and the other nonincreasing in C, then Sa≤E[Ya] and Sa¯≥E[Ya¯]. These results also hold when E [Y|A, D] and E [A|D] are nondecreasing or nonincreasing in D by Theorem 9. The following corollary shows that the results also hold under weaker assumptions: It is not necessary that E [Y|A, D] is nondecreasing or nonincreasing in D, it suffices with E [Y|a, D] and E [Y|a, D] being so, which is always true. Specifically, we say that E [Y|a, D] is nondecreasing in D if

E[Y∣a,d]≥E[Y∣a,d¯]

and we say that it is nonincreasing in D if

E[Y∣a,d]≤E[Y∣a,d¯].

Likewise for E [Y|a, D].

Corollary 13

Consider the causal graph to the right in Figure 1. If E [Y|a, D] and E [A|D] are both nondecreasing or both nonincreasing in D, then S_a ≥ E [Y_a]. If E [Y|a, D] and E [A|D] are one nondecreasing and the other nonincreasing in D, then S_a ≤ E [Y_a]. Likewise for a instead of a replacing ≥ with ≤ and vice versa.

Proof. We prove the result for when E [Y|a, D] and E [A|D] are both nondecreasing in D. The proof is similar for the remaining cases. If E [Y|a, D] is not nondecreasing in D, then make it so by parameterizing p(Y∣a¯,C)appropriately in Equation 11, e.g. by setting p(Y∣a¯,c)=p(Y∣a¯,c¯)so that E[Y∣a¯,d]=E[Y∣a¯,d¯].Then, as discussed previously, S_a ≥ E [Y_a] for the new distribution. Finally, note that the expressions for S_a and E [Y_a] do not involve p(Y∣a¯,C).So, S_a and E [Y_a] are the same for the new and the original distributions. ◻

Of course, S _a is always an upper or lower bound of E [Y_a]. The previous corollary allows us to determine always whether it is the one or the other, because E [Y|a, D] and E [A|D] are always nondecreasing or nonincreasing in D. Likewise for a instead of a. On the other hand, given a random parameterization, there is only 50 % chance that E [Y|a, D] and E [Y|a, D] are both nondecreasing or both nonincreasing in D and, thus, E [Y|A, D] is nondecreasing or nonincreasing in D and, thus, we can apply the combination of Theorem 9 and the result by VanderWeele et al. [8, Result 1] as we did above.

3.2 Transitivity

Consider the causal graph A → B → C. Let E [B|A] and E [C|B] be nondecreasing in A and B, respectively. Unfortunately, there is no guarantee that E [C|A] is nondecreasing in A, i.e. the nondecreasing property is not transitive in general [7, Example 3.2]. However, transitivity does hold when A, B and C are binary random variables [7, p. 119]. For binary random variables, Ogburn and VanderWeele [4, Lemma 1] also implies a sort of transitivity result: If E [C|B] is monotone in B, then E [C|A] is monotone in A. Theorem 7 implies then a sort of inverse transitivity result: If E [C|A] is monotone in A, then E [C|B] is monotone in B.

4 Discussion

We have extended the result in Lemma 1 by [4] stating that if E [Y|A, C] is monotone in C, then RD_obs lies between RD_true and RD_crude.We have done so by showing that the result also holds when E [Y|A, D] is monotone in D. This makes the result much more applicable in practice, as the monotonicity condition in D can be verified empirically. We have also extended along the same lines the results reported in Result 1 by [8].

The monotonicity condition in D is, however, sufficient but not necessary. In fact, we have shown through experiments that 94 %of the random parameterizations of the causal graph studied resulted in RD_obs being inside the range of RD_true and RD_crude. However, the monotonocity condition did not hold for approximately half of them. To shed some light on this question, we have characterized a nonmonotonic case (albeit empirically untestable) where RD_obs still lies between RD_true and RD_crude. In future work, we plan to investigate how to relax the monotonicity condition while keeping it sufficient and empirically testable.

Acknowledgement

We thank the Associate Editor and Reviewers for their comments, which helped us to improve our work. This work was funded by the Swedish Research Council (ref. 2019-00245).

References

[1] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.Search in Google Scholar

[2] S. Greenland. The Effect of Misclassification in the Presence of Covariates. American Journal of Epidemiology, 112(4):564–569, 1980.10.1093/oxfordjournals.aje.a113025Search in Google Scholar PubMed

[3] W. Miao, Z. Geng, and E. J. Tchetgen Tchetgen. Identifying Causal Effects with Proxy Variables of an Unmeasured Confounder. Biometrika, 105(4):987–993, 2018.10.1093/biomet/asy038Search in Google Scholar PubMed PubMed Central

[4] E. L. Ogburn and T. J. VanderWeele. On the Nondifferential Misclassification of a Binary Confounder. Epidemiology, 23(3): 433–439, 2012.10.1097/EDE.0b013e31824d1f63Search in Google Scholar PubMed PubMed Central

[5] E. L. Ogburn and T. J. VanderWeele. Bias Attenuation Results for Nondifferentially Mismeasured Ordinal and Coarsened Confounders. Biometrika, 100(1):241–248, 2013.10.1093/biomet/ass054Search in Google Scholar PubMed PubMed Central

[6] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2009.10.1017/CBO9780511803161Search in Google Scholar

[7] T. J. VanderWeele and J. M. Robins. Signed Directed Acyclic Graphs for Causal Inference. Journal of the Royal Statistical Society Series B, 72(1):111–127, 2010.10.1111/j.1467-9868.2009.00728.xSearch in Google Scholar PubMed PubMed Central

[8] T. J. VanderWeele, M. A. Hernán, and J. M. Robins. Causal Directed Acyclic Graphs and the Direction of Unmeasured Confounding Bias. Epidemiology, 19(5):720–728, 2008.10.1097/EDE.0b013e3181810e29Search in Google Scholar PubMed PubMed Central

Received: 2020-06-01

Accepted: 2020-08-19

Published Online: 2020-11-28

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/jci-2020-0014

Keywords for this article

average causal effect; confounding; monotonicity

Creative Commons

BY 4.0