Decision-theoretic foundations for statistical causality: Response to Pearl

Philip Dawid

doi:10.1515/jci-2022-0056

Article Open Access

Decision-theoretic foundations for statistical causality: Response to Pearl

Philip Dawid

Published/Copyright: November 9, 2022

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Causal Inference Volume 10 Issue 1

Abstract

I thank Judea Pearl for his discussion of my paper and respond to the points he raises. In particular, his attachment to unaugmented directed acyclic graphs has led to a misapprehension of my own proposals. I also discuss the possibilities for developing a non-manipulative understanding of causality.

Keywords: augmented DAG; causal inference; extended conditional independence; Pearlian DAG

MSC 2010: 62A01; 62C99

1 Introduction

I am grateful to Pearl for his discussion [1] of the decision-theoretic (DT) approach to causal inference, which I have been pursuing, pretty much single-handedly, for many years. I am however sorry that he has not addressed the specific purpose of the paper [2] to which it is ostensibly a response: namely, to develop deep foundations for statistical causality in general, including his own graphical approach as an important special case. Rather, he focuses on the syntax and semantics of graphical representations of causal problems.

Pearl’s discussion highlights some similarities and differences between our approaches. Such discussions are helpful in improving our understanding of statistical causality, and the ways in which we represent and manipulate it. They are not merely of theoretical interest, but carry important lessons for the everyday conduct of causal reasoning.

In Section 2, I explain that Pearl has misrepresented my approach to setting up a graphical representation of a causal problem and that we are in fact in close agreement, differing only in that I require regime indicator nodes to be explicitly represented, whereas he takes them as implicitly understood. In Section 3, I offer some suggestions towards understanding non-manipulative causality.

2 Where do we start?

One apparent difference between us concerns our starting point when modelling a real problem. Pearl talks of an observational “starting graph.” But he immediately clarifies that his starting graph cannot be just any directed acyclic graph (DAG) encoding the relevant observational conditional independencies, but must be one that is already imbued with some sort of causal meaning. I have elaborated elsewhere [3] on the difficulties and ambiguities of various attempts to ascribe causal meaning to a DAG, but I allow that Pearl at least has developed an understanding that is clear and unambiguous: specifically, that the probabilistic dependence of a variable on its graph parents is assumed unaffected by how (in which “regime”) the values of those parent variables came about – e.g. whether by intervention (“doing”) or by the unhindered action of Nature (“seeing”). Pearl’s starting graph is thus not observational at all, but rather a stripped-down version of what I term an “augmented DAG,” where there is associated, with each variable V in the DAG, an additional non-stochastic parent, its regime indicator node F V – endowing the graph with causal semantics. In Pearl’s representation the regime indicator nodes are omitted, but are to be taken as understood. But only by having in mind, already from the start, how a graph including regime nodes should look – and so what causal information it carries – could we judge whether it is an appropriate representation.

Pearl correctly points out, using a variant of my Figure 2, that if we start with two observationally equivalent graphs, and then tack on a regime indicator, we may end up with causally inequivalent graphs. The same point could be made even more simply with my Figure 1, having F T → T → Y . Without the node F T , we have the trivial observational graph T → Y , with no independencies represented. With F T , we have an augmented graph that encodes Y ⊥ ⊥ F T ∣ T , saying that the distribution of Y given T is the same, whether T arises naturally or by intervention – consistent with T causing Y (and there being no observational confounding). If we reverse the arrow from T to Y , we obtain the observational graph T ← Y , which, as before, has no independencies represented; but with F T added becomes F T → T ← Y , representing Y ⊥ ⊥ F T , which says that the distribution of Y is unaffected by what happens (intervention or none) to produce T – consistent with T having no causal effect on Y .

Figure 2

Instrumental variable with regimes.

Pearl’s starting graph, in either case, would be indistinguishable from its associated observational graph, where F T is omitted. It is his insistence on using such an unaugmented DAG representation of the causal problem that has led to his confusion. He is simply mistaken in thinking that I propose adding regime indicator nodes to some (any?) pre-existing purely observational DAG. On the contrary, my starting point would always be an appropriate augmented DAG, already encoding my causal assumptions. So I would not let the irrelevant observational equivalence mislead me into using F T → T ← Y , when I meant F T → T → Y . Consequently, although we are using different representations, both Pearl and I do essentially agree^[1] as to what should be the starting point of a causal graphical representation of a problem: an explicitly or implicitly augmented DAG, describing exactly how probabilistic relationships between variables are taken to depend, or not depend, on which regime is operating.

We are all enamoured of our own self-constructed mirrors of reality and find difficulties in taking in other points of view. Pearl proclaims difficulties in interpreting and handling the extended conditional independence properties that are fundamental to my approach. But really it is not that hard. Thus my Figure 2, for convenience reproduced here, encodes the following extended conditional independence properties (echoing the description in ref. [2], and with the same numbering):

(2) ( Z , U ) ⊥ ⊥ F X ,

(3) U ⊥ ⊥ Z ∣ F X ,

(4) Y ⊥ ⊥ Z ∣ ( X , U , F X ) ,

(5) Y ⊥ ⊥ F X ∣ ( X , U ) .

Given a basic facility with the meaning of conditional independence [4], where A ⊥ ⊥ B ∣ C says that the conditional distribution of A , given both B and C , depends only on C , these expressions can be seen as merely the symbolic representations of properties that can be expressed verbally in quite intuitive terms. In particular, the specifically causal content is carried by properties (2) and (5), which assert the stability of certain marginal and conditional probability distributions across both the observational and the interventional regime: property (2) says that the joint distribution of Z and U is a stable component, the same whether or not X is intervened on; while (5) requires that the conditional distribution of Y , given X and U , be the same in either case. Without assuming such stability properties, allowing us to transfer information between regimes, we can’t even begin to conduct causal inference from observational data. I believe that it is important to make the underlying assumptions explicit, as is achieved by Figure 2 or the equivalent algebraic properties (2)–(5).

In particular, in an augmented DAG in which every domain variable has an explicit associated regime indicator, the causal properties it represents are just that the conditional distribution of any variable, given its parents, is the same in both observational and interventional regimes. That is to say, the augmented graph explicitly encodes the causal assumptions that are left implicit (and so in danger of staying unexamined) in the unaugmented Pearlian DAG representation. This I proffer as an important advantage of having the regime variables explicitly represented. Among further advantages, the explicit representation clarifies and simplifies the rules of Pearl’s “do-calculus” [5, Section 9.7], as well as allowing us to omit irrelevant or meaningless interventions, as in Figure 2. Again, we can immediately see that the augmented graphs F T → T → Y and F T → T ← Y are inequivalent (even though, with F T removed, observationally equivalent), since, although having identical skeletons, the “immorality” F T → T ← Y in the latter is absent from the former [6].

In his footnote 3, Pearl points out, with apparent disdain, that my DT “assumptions must be checked, judgmentally, for every decision variable in the graph….” But of course they must! I hope that he too would want to check, judgmentally, the identical assumptions implicit in his unaugmented representation. Indeed, a main purpose of my paper [2] was to lay out just what considerations are required for this task.

3 Manipulability

My approach in ref. [2] was aimed at developing foundations for an “agency” understanding of causality: What happens when an external agent intervenes in a system? I believe this covers a lot of ground, but I do not insist that it is the only valid meaning of causality. In particular, it would be good to have a better understanding of the causal effects of non-manipulable variables. Pearl claims to have just this, in his concept of one variable “listening to” others. But if this is to be more than an obscure metaphor it needs fleshing out in a more fundamental way. Pearl’s rambling discussion around this in his Section 6 suggests that even he is not satisfied with it as a primitive notion. He ends up by expressing it in terms of functional dependence. I confess I am unable to appreciate how this clarifies the concept of “listening.” In any case, it takes us out of the territory of probabilistic causal representation, and into that of structural causal models (SCMs), where all uncertainty is relegated to mysterious “error variables,” and all domain variables have purely functional dependence on their parents (including error variables). I know that Pearl thinks that SCMs are the bee’s knees, but I remain unconvinced. As discussed in ref. [7], SCMs are totally unnecessary for Rung-1 associational and Rung-2 causal analysis, and when applied to Rung-3 counterfactual analysis are beset by unresolvable ambiguities.^[2] I also question some of the assumptions implicitly made in applying SCMs and the closely related structural equation models (SEMs): for example that it is the error variables, and they alone, that are necessarily the same across parallel universes; and the assumption that the effect of an intervention is appropriately modelled by replacing the right-hand side of the associated equation by a constant. In any particular application, such assumptions should be carefully justified as correctly representing the way the actual world behaves, but I am not aware of any attention ever being paid to this basic requirement. I await with interest a fundamental re-examination of SCMs parallel to my own fundamental re-examination of the DT approach in the paper under discussion.

One approach to non-manipulable causality might be based on the idea of transferability/transportability/invariance [9,10] of probabilistic relationships, across regimes that now need not be interventional. For example, they may refer to different cities or hospitals. An example is a clinical test that has stable false-positive and false-negative rates, no matter who it is applied to. Another example is Mendelian inheritance, operating in the identical probabilistic way across all matings. Again, Newton’s laws tie together the motion of the moon and the behaviour of the tides with the fall of an apple. Such invariances across regimes can again be conveniently expressed and manipulated in terms of extended conditional independencies involving regime indicator variables.^[3] Even then, I would not typically expect it to be appropriate to attach (explicitly or implicitly) a regime indicator to every domain variable in a problem.

As for Pearl’s dismissive description of the DT framework as “an exercise in prediction, rather than intervention,” this is both right and wrong. I have found it both philosophically and pragmatically liberating to reconfigure causal inference^[4] as the task of predicting what would happen under a hypothetical^[5] future intervention, on the basis of whatever (typically observational) data are available.

Conflict of interest: Prof. Philip Dawid is a member of the Editorial Board in the Journal of Causal Inference but was not involved in the review process of this article.

References

[1] Pearl J. Causation and decision: on Dawid’s “Decision-theoretic foundations for statistical causality.” J Causal Infer. 2022;10(1):221–6. 10.1515/jci-2022-0046Search in Google Scholar

[2] Dawid AP. Decision-theoretic foundations for statistical causality. J Causal Infer. 2021;9:39–77. 10.1515/jci-2020-0008. Search in Google Scholar

[3] Dawid AP. Beware of the DAG!. In: Guyon I, Janzing D, Schölkopf B, editors. Proceedings of the NIPS 2008 Workshop on Causality, volume 6 of Journal of Machine Learning Research Workshop and Conference Proceedings; 2010. p. 59–86. http://tinyurl.com/33va7tm. Search in Google Scholar

[4] Dawid AP. Conditional independence in statistical theory (with Discussion). J R Statist Soc B. 1979;41:1–31. 10.1111/j.2517-6161.1979.tb01052.xSearch in Google Scholar

[5] Dawid AP. Statistical causality from a decision-theoretic perspective. Annual Rev Statist Appl. 2015;2:273–303. 10.1146/annurev-statistics-010814-020105. Search in Google Scholar

[6] Frydenberg M. The chain graph Markov property. Scandinavian J Statist. 1990;17:333–53. Search in Google Scholar

[7] Dawid AP. The tale wags the DAG. In: Dechter R, Geffner H, Halpern JY, editors. Probabilistic and causal inference: the works of Judea Pearl, Association for Computing Machinery and Morgan & Claypool; 2022. p. 557–74. 10.1145/3501714.3501744. Search in Google Scholar

[8] Dawid AP, Musio M. Effects of causes and causes of effects. Annual Rev Statist Appl. 2022;9:261–87. 10.1146/annurev-statistics-070121-06112. Search in Google Scholar

[9] Bühlmann P. Invariance, causality and robustness. Statist Sci. 2020;35:404–26. 10.1214/19-STS721Search in Google Scholar

[10] Pearl J, Bareinboim E. Transportability of causal and statistical relations: a formal approach. In: Burgard W, Roth D, editors. Proceedings of the 25th AAAI Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press; 2011. p. 247–54. http://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/view/3769/3864. 10.1109/ICDMW.2011.169Search in Google Scholar

[11] Dawid AP. Counterfactuals, hypotheticals and potential responses: a philosophical examination of statistical causality. In: Russo F, Williamson J, editors. Causality and probability in the sciences, volume 5 of Texts in Philosophy. London: College Publications; 2007. p. 503–32. Search in Google Scholar

Received: 2022-08-24

Accepted: 2022-10-03

Published Online: 2022-11-09

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/jci-2022-0056

Keywords for this article

augmented DAG; causal inference; extended conditional independence; Pearlian DAG

Creative Commons

BY 4.0