Startseite Conditioning on Post-treatment Variables
Artikel Öffentlich zugänglich

Conditioning on Post-treatment Variables

  • Judea Pearl EMAIL logo
Veröffentlicht/Copyright: 10. Februar 2015
Veröffentlichen auch Sie bei De Gruyter Brill

Abstract

In this issue of the Causal, Casual, and Curious column, I compare several ways of extracting information from post-treatment variables and call attention to some peculiar relationships among them. In particular, I contrast do-calculus conditioning with counterfactual conditioning and discuss their interpretations and scopes of applications. These relationships have come up in conversations with readers, students and curious colleagues, so I will present them in a question–answers format.

Question-1 (Is Rule-2 valid?)

Rule-2 of do-calculus does not distinguish post-treatment from pre-treatment variables. Thus, regardless of the nature of Z, it permits us to replace P(y|do(x),z) with P(y|x,z) whenever Z separates X from Y in a mutilated graph GX_ (i.e. the causal graph, from which arrows emanating from X are removed). How can this rule be correct, when we know that one should be careful about conditioning on a post-treatment variables Z?

Example 1Consider the simple causal chainXYZ. We know that if we condition on Z (as in case control studies) selected units cease to be representative of the population, and we cannot identify the causal effect of X on Y even when X is randomized. Applying Rule-2 however we getP(y|do(x),z)=P(y|x,z). (Since X and Y are separated in the mutilated graphXYZ). This tells us that the causal effect of X on Y IS identifiable conditioned on Z. Something must be wrong here.

Answer-1

Yes, something is wrong here, but not with Rule-2. It has to do with the interpretation of P(y|do(x),z), which will become clear when we prove the validity of Rule-2 in our graph

XYZ

Rule-2 says:

P(y|do(x),z)=P(y|x,z)IfXY|ZinGX_

Indeed, if we go to the definition of P(y|do(x),z), we obtain:

P(y|do(x),z)=P(y,z|do(x))/P(z|do(x))bydef.=P(y,z|x)/P(z|x)sinceXisrandomized=P(y|x,z)

which proves Rule-2.

The same result obtains whenever Z blocks all back-door paths from X to Y, as in the canonical confounding model (Figure 1(a)), as well as in the typical selection-bias model (Figure 1(b)). P(y|do(x),z) is identified (by P(y|x,z)) in both models, despite the fact that in Figure 1(b)Z is a descendant of both treatment and outcome, in double violation of the back-door criterion. P(y|do(x),z) is no longer estimable when conditioning on Z opens a back-door path from X to Y as in Figure 2(a), because the condition (XY|Z)GX_ is violated. It is identified in Figure 2(b), where the condition is satisfied.

Figure 1: Two models in which P(y|do(x),z)=P(y|x,z)$$P(y\,|\,do(x),z) = P(y\,|\,x,z)$$ because Zd$$d$$-separates X from Y once we remove arrows emanating from X.
Figure 1:

Two models in which P(y|do(x),z)=P(y|x,z) because Zd-separates X from Y once we remove arrows emanating from X.

Figure 2: In Model (a), P(y|do(x),z)$$P(y\,|\,do(x),z)$$ is not identified (when U is unobserved) since Rule-2 in inapplicable. It is identified in Model (b) since X and Y are separated in GX_$$G_{\underline{X}}$$.
Figure 2:

In Model (a), P(y|do(x),z) is not identified (when U is unobserved) since Rule-2 in inapplicable. It is identified in Model (b) since X and Y are separated in GX_.

Question-2 (Why back-door prohibition?)

So, when do we need to worry about conditioning on X-affected covariates, virtual colliders, case control studies, etc.? It seems that Rule-2 allows us to circumvent the prohibition that the back-door criterion imposes against conditioning on a treatment-dependent Z.

Answer-2

The two are not contradictory. Rule-2 is always valid, regardless if Z is pre-treatment or post-treatment. At the same time, the prohibition imposed by the back-door cannot be dismissed, it needs to be considered on two occasions. First, whenever we seek a license to use the adjustment formula and write:

(1)P(y|do(x))=zP(y|x,z)P(z)

Second, whenever we seek to estimate causal effects in a specific group of units characterized by Z=z. Contrary to syntactic appearance, the expression P(y|do(x),z) in Rule-2, does not represent such effects when Z is post-treatment.

Let us deal with these two cases separately.

2.1 License to adjust

Consider the adjustment formula of eq. (1). This formula is not valid when Z is Y-dependent, as in our causal chain

G1:XYZ.

If we apply it blindly, we get the sum in (eq. (1)), instead of the correct answer, which is P(y|do(x))=P(y|x).

To see what goes wrong with blind adjustment, let us trace its derivation, for a pre-treatment Z:

P(y|do(x))=zP(y|do(x),z)P(z|do(x))=zP(y|x,z)P(z|do(x))byRule2=zP(y|x,z)P(z)sinceZprecedesX

This works fine when we can substitute P(z|do(x)) with P(z), but not when Z is post-treatment and P(z|do(x)) depends on x. Thus, Rule-2 in itself is not sufficient for adjustment; blind adjustment will produce erroneous estimands.

If we avoid the substitution P(z|do(x))=P(z|x) and proceed cautiously in G1, we get

P(y|do(x))=zP(y|do(x),z)P(z|do(x))=zP(y|x,z)P(z|do(x))byRule-2=zP(y|x,z)P(z|x)byRule-2onZ=P(y|x)

which is the correct answer for G1. But this is obtained though careful derivation, not by blind adjustment.

Blind adjustment is valid, however, when Z is pure descendant [1] of X, as in Figure 3. We know that the back-door prohibition against post-treatment covariates is lifted in this case [1, p. 339, 2] and, indeed, if we take Z as a covariate and blindly apply the adjustment formula to G2, we get the correct result:

(2)P(y|do(x))=zP(y|x,z)P(z)=P(y|x)
Figure 3: A model in which Z is a pure descendant of X, thus satisfying the (extended) back-door condition and permitting adjustment for Z.
Figure 3:

A model in which Z is a pure descendant of X, thus satisfying the (extended) back-door condition and permitting adjustment for Z.

The latter equality is obtained through the conditional independence P(y|x,z)=P(y|x) which holds in G2.

2.2 Identifying unit-specific effects

We are now ready to discuss the second task for which back-door admissibility is needed: estimating unit-specific effects.

In many applications, the query of interest is not to find Qdo=P(y|do(x),z), but to find Qc=P(yx|z), where yx is short for the counterfactual statement Yx=y. Back-door admissibility gives us the license to equate the two queries, and get

(3)P(y|do(x),z)=P(yx|z)=P(y|x,z)

By the counterfactual query Qc we mean: Take all units which are currently at level Z=z, and ask what their Y would be had they been exposed to treatment X=x. This is different from Qdo=P(y|do(x),z)), which means: Expose the whole population to treatment X=x, take all units which attained level Z=z (post exposure) and report their Y s.

We call Qc‘‘unit-specific’’ because, as x varies, Qc remains focused on the same set of units (i.e. those that are currently at Z=z), with (hypothetical) histories that vary with x. Some of these units may not have experienced any of those histories and would have attained different levels of Z if they did. In contrast, Qdo focusses on one stratum, Z=z, and, as x varies, it allows different units to enter and leave that stratum.

Obviously, when Z is a pre-treatment covariate, we have Qdo=Qc, but when Z is post-treatment, the most common question we ask is Qc: find P(yx|z), not Qdo: find P(y|do(x),z)). The back-door criterion gives us a license to equate both queries with P(y|x,z). Here is why: If Z satisfies the back-door condition, the First Law of causal inference [2] dictates the conditional independence YxX|Z, also known as ‘‘conditional ignorability’’ [3], so

P(yx|z)=P(yx|z,x)=P(y|z,x).

This license is similar to Rule-2, but it is applied to a different expression; whereas ignorability allows us to remove a subscript, Rule-2 allows us to remove a do-operator.

We can see the difference in graph G2 of Figure 3. Here Z satisfies the (extended) back-door condition, so we can write

P(yx|z)=P(y|z,x)=P(y|x)

Rule-2 in itself does not give us this license because it is applicable to a different query P(y|do(x),z) and cannot handle counterfactual expressions.

Question-3 (the key question)

Should we be concerned with the difference between Qdo and Qc? If so, when?

Answer-3

We certainly should, because the two questions have different semantics and deliver different answers, whenever Z does not satisfy the back-door condition. This can be demonstrated in graph G3.[3]

G3:XZY

In this graph, Qdo gives:

P(y|do(x),z))=P(y|x,z)(fromRule-2)=P(y|z)

While Qc gives:

P(yx|z)=xP(yx|x,z)P(x|z)=P(y|x,z)P(x|z)+xxP(yx|x,z)P(x|z)=P(y,x|z)+xxP(yx|x,z)P(x|z)

which is totally alien to Qdo=P(y|z).

Intuition supports this inequality. If we let X be education, Z be skill and Y be salary, Qdo looks at people assigned to x years of education who subsequently achieved skill level z, and asks how would their salary Y depend on x, assuming that they end up with the same skill Z=z. The graph states that skill alone determines salary, not how it was acquired, therefore Qdo evaluates to: P(y|do(x),z))=P(y|z) namely, education has no effect on salary, once we know z, as shown in the graph. [4] In contrast, Qc asks for the role that education plays in the salary of one specific group of units, those at skill Z=z. In other words, we look at those who are currently at skill Z=z and ask, counterfactually: what their salary would be like had they received x years of schooling. Since some of those at skill Z=z had no schooling, their skill level would be greater than z had they received schooling, and so would their salary. This explains the inequality QdoQc.

Question-4 (QdoorQc)

Which query, Qdo or Qc, is normally asked when Z is affected by X?

Answer-4

Qdo is rarely posed as a research question of interest, probably because it lacks immediate causal interpretation. It serves primarily as an auxiliary mathematical object in the service of other research questions. One such research question is the unconditional causal effect of X on Y, denoted P(y|do(x)), which is fully analyzed using the do-calculus [4], namely, using Qdo. Another research question benefitting from Qdo occurs in transportability problems [5, 6], where the target query is P(y|do(x)) (the causal effect in a new population), and has been fully analyzed in do-calculus, again, using Qdo. I have not seen Qdo presented as a target query on its own right.

Question-5 (selection bias)

What about selection bias problems, where the selection mechanism is often outcome-dependent?

Answer-5

If we aim at estimating P(y|do(x)) from selection biased data under S=1, we are not asking for Qdo nor for Qdo. Rather, we are asking for P(y|do(x)) and we are allowed to use all means available, including the rules of do-calculus (which invoke P(y|do(x),z)) as long as we can recover P(y|do(x)) from selection biased data [7].

To demonstrate, assume that variable Z in Figure 3 stands for ‘‘selection’’ to the data, and our task is to recover the causal effect P(y|do(x)). Applying Rule-2 (on the null set) we can write

P(y|do(x))=P(y|x)=P(y|x,Z=1)usingYZ|X

which established the recovery of the target effect from the biased data P(y|x,Z=1).

As another example, consider the following model (after [8]) XYLS where L is unobserved and S=1 represents selection. Since S is not separable from Y, P(y|do(x)) is not recoverable from the data P(x,y|S=1). (For intuition, imagine the confounder L being sex, in a study that excludes girls from participation. Surely, the average treatment effect is not recoverable from male-only data.) Assume moreover that only few cases drop from the study, i.e. P(S=0) is small and estimable. We can then write

P(y|do(x))=P(y|do(x),S=1)P(S=1|do(x))+P(y|do(x),S=0)P(S=0|do(x))

and obtain a lower bound

P(y|do(x))P(y|do(x),S=1)P(S=1|do(x))

Two points are worth noting (1): the lower bound has the form of Qdo:P(y|do(x),z) and (2) the lower bound is estimable from the data available, giving P(y|x)P(S=1|do(x)).

This bounding method does not work for the graph XYS. Writing:

P(y|do(x))>P(y|do(x),S=1)P(S=1|do(x)),

we see that, even if we are given the last term, P(S=1|do(x)), we cannot estimate the first.

It is important to note that, if we set out to estimate this bound, our target of identification would be a Qdo-type expression P(y|do(x),S=1) where S is a descendant of X and we could unleash the full power of do-calculus, ignoring the fact that we are only in possession of biased data, conditioned on S=1.

Conclusions

Rule-2 of do-calculus is valid for both pre-treatment and post-treatment variables. The rule may appear as violating traditional warnings against conditioning on post-treatment variables, but such warnings apply only to stronger claims, not the one made by Rule-2. The stronger claims are (1): the identification of causal effects by adjustment and (2) the identification of unit-specific effects through counterfactual independence (i.e. ‘‘ignorability’’). The assumptions needed for these two tasks are satisfied by the back-door criterion and that is where the special handling of post-treatment covariates becomes necessary.

Funding statement: Funding: This research was supported in parts by grants from NSF #IIS-1302448 and ONR #N00014-10-1-0933 and #N00014-13-1-0153.

Acknowledgment

I thank Elias Bareinboim, Sander Greenland, Karthika Mohan and many bloggers on http://www.mii.ucla.edu/causality/ for being part of these conversations.

References

1. PearlJ. Causality: models, reasoning, and inference, 2nd ed. New York: Cambridge University Press, 2009.10.1017/CBO9780511803161Suche in Google Scholar

2. ShpitserI, VanderWeeleT, RobinsJ. 2010. On the validity of covariate adjustment for estimating causal effects. In Proceedings of the twenty-sixth conference on uncertainty in artificial intelligence. Corvallis, OR: AUAI:52736.Suche in Google Scholar

3. RosenbaumP, RubinD. The central role of propensity score in observational studies for causal effects. Biometrika1983;70:4155.10.1093/biomet/70.1.41Suche in Google Scholar

4. ShpitserI, PearlJ. Complete identification methods for the causal hierarchy. J Mach Learn Res2008;9:194179.Suche in Google Scholar

5. BareinboimE, PearlJ. Transportability from multiple environments with limited experiments: Completeness results. In WellingM, GhahramaniZ, CortesC, LawrenceN, editors. Advances of Neural Information Processing 27 (NIPS Proceedings). 2014, 280288. http://ftp.cs.ucla.edu/pub/stat_ser/r443.pdf.Suche in Google Scholar

6. PearlJ, BareinboimE. External validity: from do-calculus to transportability across populations. Stat Sci2014;29:57995.10.1214/14-STS486Suche in Google Scholar

7. BareinboimE, TianJ, PearlJ. Recovering from selection bias in causal and statistical inference. In BrodleyCE, StoneP, editors. Proceedings of the Twenty-eighth AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2014. Best Paper Award, http://ftp.cs.ucla.edu/pub/stat_ser/r425.pdf10.1609/aaai.v28i1.9074Suche in Google Scholar

8. GarciaFM. Definition and diagnosis of problematic attrition in randomized controlled experiments. Working paper, 2013. Available at SSRN: http://ssrn.com/abstract=2267120Suche in Google Scholar

9. BalkeA, PearlJ. Probabilistic evaluation of counterfactual queries. In Proceedings of the twelfth national conference on artificial intelligence, vol. I. Menlo Park, CA: MIT Press, 1994:2307.Suche in Google Scholar

Published Online: 2015-2-10
Published in Print: 2015-3-1

©2015 by De Gruyter

Heruntergeladen am 15.10.2025 von https://www.degruyterbrill.com/document/doi/10.1515/jci-2015-0005/html?lang=de
Button zum nach oben scrollen