Home Causation and decision: On Dawid’s “Decision theoretic foundation of statistical causality”
Article Open Access

Causation and decision: On Dawid’s “Decision theoretic foundation of statistical causality”

  • Judea Pearl EMAIL logo
Published/Copyright: September 16, 2022
Become an author with De Gruyter Brill

Abstract

In a recent issue of this journal, Philip Dawid (2021) proposes a framework for causal inference that is based on statistical decision theory and that is, in many aspects, compatible with the familiar framework of causal graphs (e.g., Directed Acyclic Graphs (DAGs)). This editorial compares the methodological features of the two frameworks as well as their epistemological basis.

MSC 2010: 62A01; 62C99

1 Introduction

I have followed the works of Professor Dawid since the early 1980s, when I discovered his seminal paper, “Conditional Independence in Statistical Theory” [1]. In that paper, Dawid boldly protests statistics’ stalemate over causality and declares: “Causal inference is one of the most important, most subtle, and most neglected of all the problems of statistics.” In the past four decades, a period that saw a revolutionary progress in causal inference [2], Dawid has contributed substantially to this progress but has consistently demanded that our understanding of causality be grounded in the tradition of statistical decision theory (DT), both conceptually and notationally. This paper [3] is a culmination of Dawid’s efforts and shows vividly what portions of the causal revolution can be re-formulated in the statistical DT paradigm, and what the costs and benefits are of imposing this re-formulation.

2 The statistical DT paradigm

The main thrust of the DT paradigm is to view causal inference as a decision-aiding exercise and to avoid whenever possible any concept or assumption that is not absolutely necessary for that exercise, especially those expressed in a vocabulary alien to traditional statistics. The outcome of this stat-exclusive strategy can be seen in the way Dawid articulates the assumptions that are needed for a DT task to commence. Equations (2)–(5), for example, convey conditional independence (CI) relations among observed or observable variables but do not involve any counterfactual variables (e.g., Y x ) or d o-operators, nor any notational device outside the vocabulary of traditional probability theory. Even the structure of the graph and the directionality of its arrows can be ignored once we accept this set of independencies. The only extra-statistical object is the “regime indicator variable,” F X , which acts as an intervention instrument.

In summary, the constraints implied by the DT framework amount to translating all causal knowledge (usually encoded in a causal graph) to a set of independencies involving regime indicators, allowing no other notational device, and pursuing the analysis as if it were an exercise of prediction, rather than intervention.

3 What we gain and what we lose

Readers familiar with the ladder of causation [2] and its logical foundations [4] would recognize immediately that, by forbidding counterfactual expressions, the entirety of Rung-3 of the hierarchy (also labeled “counterfactual” or “imagination”) would become inaccessible to researchers adhering to the DT paradigm. This means that questions related to probabilities of causation [5,6], personalized decision making [7], causes of effects [8,9], and large segments of mediation analysis [10] would be excluded from the analysis.

On the other hand, once F X adequately simulates the d o -operator, it immediately lifts the model from Rung-1 of the ladder to Rung-2 (intervention) level and endows Dawid’s DT framework with all the capabilities of the d o -calculus and its associated applications, including covariate selection, identifiability results, transportability analysis, missing data, and more.[1]

Dawid’s paper gives us in fact a comprehensive account of those tasks that do not rely on Rung-3 information but could be accomplished entirely within Rung-2 (equivalently, within DT), including tasks that were first formulated and solved using the structural causal model (SCM) as a starting point. (The analysis of non-compliance [12] is a good example.)

It thus appears that, for the restricted set of applications limited to Rung-2 of the ladder, the DT paradigm offers a coherent, self-contained framework for causal inference and analysis. We will soon see, however, that, even in this restricted set of applications, DT cannot live up to its promise of avoiding concepts that are alien to traditional statistics. This will be shown both from basic principles and in the specific model of Figure 2.

4 Can the DT paradigm deliver on its promises?

From basic principles we know that in order to guarantee that F X adequately simulates an intervention on X , the starting graph (also called “observational” or “unaugmented”) must already carry some causal information, in addition to its CIs. The absence of such information would leave us on Rung-1 of the ladder of causation, unable to construct the CIs associated with F X , hence unable to reason with interventions.

From this basic fact, we can conclude immediately that the arrows in the starting graph are endowed with causal information, that the author of that information chose to encode it in the form of causal symbols (nodes and arrows), and that the translation to the CI representation requires an understanding of those causal symbols.[2] In other words, the analyst must be versed in the calculus of symbols that are alien to statistics, contrary to the DT agenda.

More generally, this means that every DT researcher, even when restricted to decision-making tasks, must carry in mind a mental representation of causal information and must be endowed with the logic of translating this information to CI statements. How then is this information stored in the researcher’s mind?

Going to the specific model of Figure 2, the insufficiency of CI can be shown by assuming that Z is unobserved, so that the starting directed acyclic graph (DAG) consists of just three variables: X Y U X , and equation (2) would then read F X U . Now reverse all arrows and let the starting DAG be X Y U X . All CIs in the original (unextended) graph would remain the same as before (i.e., no CI exists) but, upon adding F X to the graph, equation (2) would be violated. In order for F X to adequately represent a regime indicator for X , equation (2) needs to be revised to read F X Y .

We again conclude, as before, that the starting graph must be treated as a carrier of causal information, not merely statistical information, and that at least part of this causal information must be judgmental, provided by a domain expert, since data alone is insufficient to lift us from Rung-1 to Rung-2. This, again, makes it impossible for a DT analyst to avoid the language of nodes and arrows.

This brings us the major criticism I have of the DT framework, its consistency and its ontological basis.

5 Is the DT paradigm worthy of pursuit?

I’ll start by questioning the very purpose of DT, which I understand to be: Liberating analysts from dependence on foreign languages, contaminated with non-statistical objects such as causal arrows or counterfactual variables, and shielding them from the dangers that may loom from judgmental assumptions involving such objects.

I perfectly understand Dawid’s apprehension of these objects because I’ve been there with him in the early 1990s, when causality was in its embryonic stage, viewed with suspicion, and when statistical entities were universally judged to be more principled, more understood, and certainly more trustworthy and respectable. Naturally, as an active member of this culture of suspicion, I found it safer to introduce the d o -operator and the backdoor criterion using an intervention variable called F i , with three regimes [13], precisely the way it is defined in Dawid’s paper. I’ve quickly abandoned this notation for reasons that Dawid now finds to be “obscure,” but which I’ve found to be unassailable: (1) If the causal information needed to define F X comes from the unaugmented DAG, what is the point of cluttering the DAG with redundant F X variables when we can go directly to the DAG and extract whatever information is needed using graphical algorithms? (2) If the causal information needed to define F X comes from the unaugmented DAG, then the directionality of the arrows in that DAG comes not from mentally simulated interventions but from deeper, more reliable modes of judgment.[3] What are those modes? And why not harness them to guide all levels of causal reasoning, from Rung-1 up to Rung-3?

6 The fundamental question: How do people store causal knowledge?

This fundamental question is rarely asked by causal analysts or even by philosophers, because it is deemed to be psychological, rather than methodological. I disagree. The question of how scientists store scientific knowledge or how children store toy-world information is of fundamental importance in any framework, because whenever we rely on that knowledge, we must extract it from its very source, in its natural habitat, with minimum distortion, so as to preserve its veracity; the quality of our decisions depends on this veracity.

To drive the point home, imagine that you are given equations (2)–(5) instead of the graph, and you are asked to judge whether the four equations adequately represent what you know about the problem domain. I have been working with Dawid’s CI notation for 37 years, yet no matter how long I examine these equations, I remain unsure whether I have not left out a symbol or two. This does not happen to me with a DAG; I may doubt whether I have enough knowledge to determine each and every arrow, true, but I never doubt whether the DAG represents the knowledge that I do have (say, that roosters do not cause sunrise or that ice-cream sales does not cause people to drown).

The conclusion I draw is, first, that people do not store causal knowledge in the form of CI assertions and, second, that graphical relationships in the form of “who listens to whom” is a more promising model of how human knowledge is stored.

The anthropomorphic metaphor of “listening” may drive some purists skeptical but, recall, we are seeking the most rudimentary primitives, or building blocks, with which knowledge is represented in our mind. Such building blocks must be metaphorical, and many of them anthropomorphic, since these are the chunks of expertise we acquire in childhood.[4]

It is not an accident that our everyday language for causation is replete with graphical metaphors (e.g., “causal pathways,” “mediate between X and Y ”), nor is it a coincident that the first formal representation of causal relations turned out to be “path diagrams” [16].

Still, to pacify the purists, we can replace “listening to” with “sources of variations.” In other words, when deciding whether an arrow X Y is appropriate in the DAG, the analyst must ask herself “what are the sources of the variations we may observe in Y ?” If X qualifies as one of those (direct) sources, an arrow X Y is introduced.

Admittedly, super-purists would not buy even the “sources of variation” metaphor as a legitimate basis for capturing human intuition about causation. For them, science has groomed a mathematical object that acts precisely as “listen to” and “sources of variation,” yet it is decidedly respectable and fairly common; it is called a “function.” Thus, to decide whether an arrow X Y is appropriate in the DAG, the analyst must ask whether the relationship between X and Y requires a non-trivial function y = f(x,u) for some experimental unit u. This criterion, though less likely to be understood by rank and file researchers, is a favorite in economics [17,18] and serves as the basis of structural equations models [19] and SCMs [11, p. 27].

It is the functional, quasi-deterministic nature of SCMs that allows us to derive all counterfactual relationships from any given SCM model [20] and to handle systems with feedback [11, p. 215].[5]

7 The manipulability issue

My insistence on ascribing causal status to arrows in the unaugmented DAG, regardless of the manipulability of the variables involved, has invited a flood of criticism from manipulability-minded researchers, claiming that the d o -calculus relies on the assumption that all variables are physically manipulable. There is nothing to support this misconception. While I have given causal semantics to the operator d o ( X = x ) with X non-manipulable [21], this does not mean that we have the physical means of directly intervening on X .[6] To suppress such misunderstanding, it is trivial, yet totally unnecessary, to attach special labels to such variables in the graph. At the same time, I consider it necessary to be able to accommodate common statements such as “The moon causes tides” without seeking phantom interventions on the moon position.

Causal graphs permit us to do so without trepidation.

8 Conclusion

Dawid considers it advantageous to formulate and analyze interventional tasks within the confines of the DT paradigm, as opposed to causal Bayesian Networks or structural equation models. This preference stated explicitly in ref. [3]:

Since a Pearlian DAG is just an alternative representation of a particular kind of augmented DAG, its appropriateness must once again depend on the acceptability of the strong assumptions, described in Section 10.2, needed to justify augmentation of an observational DAG.

I hope my comments convince Dawid that the opposite is true; the observational (i.e., unaugmented) DAG comes before its augmentation. Its appropriateness comes directly from domain knowledge independently of any augmentation one may wish to entertain. Moreover, the appropriateness of any augmentation depends not on CI assumptions but on causal assumptions embedded in the unaugmented DAG. Finally, the causal reading of the arrows in the unaugmented DAG is not based on interventional considerations, but on the more fundamental relation of “listens to,” which applies to both manipulable and non-manipulable variables.

  1. Funding information: This research was supported in parts by grants from the National Science Foundation (No. IIS-2106908), Office of Naval Research (No. N00014-17-12091), and Toyota Research Institute of North America (No. #PO000897).

  2. Conflict of interest: Author states no conflict of interest.

References

[1] Dawid AP. Conditional independence in statistical theory. J R Statist Soc B. 1979;41(1):1–31. 10.1111/j.2517-6161.1979.tb01052.xSearch in Google Scholar

[2] Pearl J, Mackenzie D. The book of why: the new science of cause and effect. New York: Basic Books; 2018. Search in Google Scholar

[3] Dawid AP. Decision-theoretic foundations for statistical causality. J Causal Infer. 2021;9:39–77. 10.1515/jci-2020-0008Search in Google Scholar

[4] Bareinboim E, Correa JD, Ibeling D, Icard T. On Pearl’s hierarchy and the foundations of causal inference. In: Geffner H, Decter R, Halpern JY, editors. Probabilistic and Causal Inference: The Works of Judea Pearl. vol. 36. New York, NY, USA: Association for Computing Machinery; 2022. p. 640–6. 10.1145/3501714.3501743Search in Google Scholar

[5] Tian J, Pearl J. Probabilities of causation: bounds and identification. Annals Math Artif Intell. 2000;28:287–313. 10.1023/A:1018912507879Search in Google Scholar

[6] Dawid AP, Musio M, Murtas R. The probability of causation. Law Probability Risk. 2017;16:163–79. 10.1093/lpr/mgx012Search in Google Scholar

[7] Mueller S and Pearl J. Personalized decision making - a conceptual introduction. Technical Report R-513, Department of Computer Science. Los Angeles, CA: University of California; 2022. Search in Google Scholar

[8] Pearl J. Causes of effects and effects of causes. J Sociol Meth Res. 2015:44:149–64. 10.1177/0049124114562614Search in Google Scholar

[9] Mueller S, Li A, Pearl J. Causes of effects: Learning individual responses from population data. In: De Raedt L, editor. Proceedings of the Thirty-First International Joint Conference on, IJCAI-22. International Joint Conferences on Organization. July 2022. p. 2712–8. 10.24963/ijcai.2022/376Search in Google Scholar

[10] Pearl J. Interpretation and identification of causal mediation. Psychol Meth. 2014;19:459–81. 10.1037/a0036434Search in Google Scholar PubMed

[11] Pearl J. Causality: models, reasoning, and inference. 2nd edition. New York: Cambridge University Press; 2009. 10.1017/CBO9780511803161Search in Google Scholar

[12] Balke A, Pearl J. Bounds on treatment effects from studies with imperfect compliance. J Am Statist Assoc. September 1997;92(439):1172–6. 10.1080/01621459.1997.10474074Search in Google Scholar

[13] Pearl J. Comment: graphical models, causality, and intervention. Statist Sci. 1993;8(3):266–9. 10.1214/ss/1177010894Search in Google Scholar

[14] Joffe MM, Yang WP, Feldman HI. Selective ignorability assumptions in causal inference. Int J Biostatist. 2010;6(2):Article 11. 10.2202/1557-4679.1199.Search in Google Scholar PubMed

[15] Pearl J. Myth, confusion, and science in causal analysis. Technical Report R-348. Los Angeles, CA: University of California; 2009. Search in Google Scholar

[16] Wright S. The relative importance of heredity and environment in determining the piebald pattern of guinea-pigs. In: Proceedings of the National Academy of Sciences of the United States of America. Vol. 6. 1920. p. 320–32. 10.1073/pnas.6.6.320Search in Google Scholar PubMed PubMed Central

[17] Haavelmo T. The statistical implications of a system of simultaneous equations. Econometrica 1943;11:1–12. 10.2307/1905714Search in Google Scholar

[18] Pearl J. Trygve Haavelmo and the emergence of causal calculus. Econometric Theory. 2015;31:152–79. Reprinted in D.F. Hendry and M.S. Morgan (Eds.), The Foundations of Econometric Analysis, Cambridge University Press, 477-490, 1995. 10.1017/S0266466614000231Search in Google Scholar

[19] Pearl J. The causal foundations of structural equation modeling. In: Hoyle RH, editor. Handbook of structural equation modeling. New York: Guilford Press; 2012. p. 68–91. Second edition forthcoming, January 2023. 10.21236/ADA557445Search in Google Scholar

[20] Pearl J. On the first law of causal inference. Causal Analysis in Theory and Practice, 29 Nov. 2014, http://causality.cs.ucla.edu/blog/index.php/2014/11/29/on-the-first-law-of-causal-inference/.Search in Google Scholar

[21] Pearl J. On the interpretation of do(x). J Causal Infer. 2019;7(1):20192002. 10.1515/jci-2019-2002Search in Google Scholar

Received: 2022-07-22
Accepted: 2022-07-27
Published Online: 2022-09-16

© 2022 Judea Pearl, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Editorial
  2. Causation and decision: On Dawid’s “Decision theoretic foundation of statistical causality”
  3. Research Articles
  4. Simple yet sharp sensitivity analysis for unmeasured confounding
  5. Decomposition of the total effect for two mediators: A natural mediated interaction effect framework
  6. Causal inference with imperfect instrumental variables
  7. A unifying causal framework for analyzing dataset shift-stable learning algorithms
  8. The variance of causal effect estimators for binary v-structures
  9. Treatment effect optimisation in dynamic environments
  10. Optimal weighting for estimating generalized average treatment effects
  11. A note on efficient minimum cost adjustment sets in causal graphical models
  12. Estimating marginal treatment effects under unobserved group heterogeneity
  13. Properties of restricted randomization with implications for experimental design
  14. Clarifying causal mediation analysis: Effect identification via three assumptions and five potential outcomes
  15. A generalized double robust Bayesian model averaging approach to causal effect estimation with application to the study of osteoporotic fractures
  16. Sensitivity analysis for causal effects with generalized linear models
  17. Individualized treatment rules under stochastic treatment cost constraints
  18. A Lasso approach to covariate selection and average treatment effect estimation for clustered RCTs using design-based methods
  19. Bias attenuation results for dichotomization of a continuous confounder
  20. Review Article
  21. Causal inference in AI education: A primer
  22. Commentary
  23. Comment on: “Decision-theoretic foundations for statistical causality”
  24. Decision-theoretic foundations for statistical causality: Response to Shpitser
  25. Decision-theoretic foundations for statistical causality: Response to Pearl
  26. Special Issue on Integration of observational studies with randomized trials
  27. Identifying HIV sequences that escape antibody neutralization using random forests and collaborative targeted learning
  28. Estimating complier average causal effects for clustered RCTs when the treatment affects the service population
  29. Causal effect on a target population: A sensitivity analysis to handle missing covariates
  30. Doubly robust estimators for generalizing treatment effects on survival outcomes from randomized controlled trials to a target population
Downloaded on 10.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jci-2022-0046/html
Scroll to top button