Abstract
In a recent issue of this journal, Philip Dawid (2021) proposes a framework for causal inference that is based on statistical decision theory and that is, in many aspects, compatible with the familiar framework of causal graphs (e.g., Directed Acyclic Graphs (DAGs)). This editorial compares the methodological features of the two frameworks as well as their epistemological basis.
1 Introduction
I have followed the works of Professor Dawid since the early 1980s, when I discovered his seminal paper, “Conditional Independence in Statistical Theory” [1]. In that paper, Dawid boldly protests statistics’ stalemate over causality and declares: “Causal inference is one of the most important, most subtle, and most neglected of all the problems of statistics.” In the past four decades, a period that saw a revolutionary progress in causal inference [2], Dawid has contributed substantially to this progress but has consistently demanded that our understanding of causality be grounded in the tradition of statistical decision theory (DT), both conceptually and notationally. This paper [3] is a culmination of Dawid’s efforts and shows vividly what portions of the causal revolution can be re-formulated in the statistical DT paradigm, and what the costs and benefits are of imposing this re-formulation.
2 The statistical DT paradigm
The main thrust of the DT paradigm is to view causal inference as a decision-aiding exercise and to avoid whenever possible any concept or assumption that is not absolutely necessary for that exercise, especially those expressed in a vocabulary alien to traditional statistics. The outcome of this stat-exclusive strategy can be seen in the way Dawid articulates the assumptions that are needed for a DT task to commence. Equations (2)–(5), for example, convey conditional independence (CI) relations among observed or observable variables but do not involve any counterfactual variables (e.g.,
In summary, the constraints implied by the DT framework amount to translating all causal knowledge (usually encoded in a causal graph) to a set of independencies involving regime indicators, allowing no other notational device, and pursuing the analysis as if it were an exercise of prediction, rather than intervention.
3 What we gain and what we lose
Readers familiar with the ladder of causation [2] and its logical foundations [4] would recognize immediately that, by forbidding counterfactual expressions, the entirety of Rung-3 of the hierarchy (also labeled “counterfactual” or “imagination”) would become inaccessible to researchers adhering to the DT paradigm. This means that questions related to probabilities of causation [5,6], personalized decision making [7], causes of effects [8,9], and large segments of mediation analysis [10] would be excluded from the analysis.
On the other hand, once
Dawid’s paper gives us in fact a comprehensive account of those tasks that do not rely on Rung-3 information but could be accomplished entirely within Rung-2 (equivalently, within DT), including tasks that were first formulated and solved using the structural causal model (SCM) as a starting point. (The analysis of non-compliance [12] is a good example.)
It thus appears that, for the restricted set of applications limited to Rung-2 of the ladder, the DT paradigm offers a coherent, self-contained framework for causal inference and analysis. We will soon see, however, that, even in this restricted set of applications, DT cannot live up to its promise of avoiding concepts that are alien to traditional statistics. This will be shown both from basic principles and in the specific model of Figure 2.
4 Can the DT paradigm deliver on its promises?
From basic principles we know that in order to guarantee that
From this basic fact, we can conclude immediately that the arrows in the starting graph are endowed with causal information, that the author of that information chose to encode it in the form of causal symbols (nodes and arrows), and that the translation to the CI representation requires an understanding of those causal symbols.[2] In other words, the analyst must be versed in the calculus of symbols that are alien to statistics, contrary to the DT agenda.
More generally, this means that every DT researcher, even when restricted to decision-making tasks, must carry in mind a mental representation of causal information and must be endowed with the logic of translating this information to CI statements. How then is this information stored in the researcher’s mind?
Going to the specific model of Figure 2, the insufficiency of CI can be shown by assuming that
We again conclude, as before, that the starting graph must be treated as a carrier of causal information, not merely statistical information, and that at least part of this causal information must be judgmental, provided by a domain expert, since data alone is insufficient to lift us from Rung-1 to Rung-2. This, again, makes it impossible for a DT analyst to avoid the language of nodes and arrows.
This brings us the major criticism I have of the DT framework, its consistency and its ontological basis.
5 Is the DT paradigm worthy of pursuit?
I’ll start by questioning the very purpose of DT, which I understand to be: Liberating analysts from dependence on foreign languages, contaminated with non-statistical objects such as causal arrows or counterfactual variables, and shielding them from the dangers that may loom from judgmental assumptions involving such objects.
I perfectly understand Dawid’s apprehension of these objects because I’ve been there with him in the early 1990s, when causality was in its embryonic stage, viewed with suspicion, and when statistical entities were universally judged to be more principled, more understood, and certainly more trustworthy and respectable. Naturally, as an active member of this culture of suspicion, I found it safer to introduce the
6 The fundamental question: How do people store causal knowledge?
This fundamental question is rarely asked by causal analysts or even by philosophers, because it is deemed to be psychological, rather than methodological. I disagree. The question of how scientists store scientific knowledge or how children store toy-world information is of fundamental importance in any framework, because whenever we rely on that knowledge, we must extract it from its very source, in its natural habitat, with minimum distortion, so as to preserve its veracity; the quality of our decisions depends on this veracity.
To drive the point home, imagine that you are given equations (2)–(5) instead of the graph, and you are asked to judge whether the four equations adequately represent what you know about the problem domain. I have been working with Dawid’s CI notation for 37 years, yet no matter how long I examine these equations, I remain unsure whether I have not left out a symbol or two. This does not happen to me with a DAG; I may doubt whether I have enough knowledge to determine each and every arrow, true, but I never doubt whether the DAG represents the knowledge that I do have (say, that roosters do not cause sunrise or that ice-cream sales does not cause people to drown).
The conclusion I draw is, first, that people do not store causal knowledge in the form of CI assertions and, second, that graphical relationships in the form of “who listens to whom” is a more promising model of how human knowledge is stored.
The anthropomorphic metaphor of “listening” may drive some purists skeptical but, recall, we are seeking the most rudimentary primitives, or building blocks, with which knowledge is represented in our mind. Such building blocks must be metaphorical, and many of them anthropomorphic, since these are the chunks of expertise we acquire in childhood.[4]
It is not an accident that our everyday language for causation is replete with graphical metaphors (e.g., “causal pathways,” “mediate between
Still, to pacify the purists, we can replace “listening to” with “sources of variations.” In other words, when deciding whether an arrow
Admittedly, super-purists would not buy even the “sources of variation” metaphor as a legitimate basis for capturing human intuition about causation. For them, science has groomed a mathematical object that acts precisely as “listen to” and “sources of variation,” yet it is decidedly respectable and fairly common; it is called a “function.” Thus, to decide whether an arrow
It is the functional, quasi-deterministic nature of SCMs that allows us to derive all counterfactual relationships from any given SCM model [20] and to handle systems with feedback [11, p. 215].[5]
7 The manipulability issue
My insistence on ascribing causal status to arrows in the unaugmented DAG, regardless of the manipulability of the variables involved, has invited a flood of criticism from manipulability-minded researchers, claiming that the
Causal graphs permit us to do so without trepidation.
8 Conclusion
Dawid considers it advantageous to formulate and analyze interventional tasks within the confines of the DT paradigm, as opposed to causal Bayesian Networks or structural equation models. This preference stated explicitly in ref. [3]:
Since a Pearlian DAG is just an alternative representation of a particular kind of augmented DAG, its appropriateness must once again depend on the acceptability of the strong assumptions, described in Section 10.2, needed to justify augmentation of an observational DAG.
I hope my comments convince Dawid that the opposite is true; the observational (i.e., unaugmented) DAG comes before its augmentation. Its appropriateness comes directly from domain knowledge independently of any augmentation one may wish to entertain. Moreover, the appropriateness of any augmentation depends not on CI assumptions but on causal assumptions embedded in the unaugmented DAG. Finally, the causal reading of the arrows in the unaugmented DAG is not based on interventional considerations, but on the more fundamental relation of “listens to,” which applies to both manipulable and non-manipulable variables.
-
Funding information: This research was supported in parts by grants from the National Science Foundation (No. IIS-2106908), Office of Naval Research (No. N00014-17-12091), and Toyota Research Institute of North America (No. #PO000897).
-
Conflict of interest: Author states no conflict of interest.
References
[1] Dawid AP. Conditional independence in statistical theory. J R Statist Soc B. 1979;41(1):1–31. 10.1111/j.2517-6161.1979.tb01052.xSearch in Google Scholar
[2] Pearl J, Mackenzie D. The book of why: the new science of cause and effect. New York: Basic Books; 2018. Search in Google Scholar
[3] Dawid AP. Decision-theoretic foundations for statistical causality. J Causal Infer. 2021;9:39–77. 10.1515/jci-2020-0008Search in Google Scholar
[4] Bareinboim E, Correa JD, Ibeling D, Icard T. On Pearl’s hierarchy and the foundations of causal inference. In: Geffner H, Decter R, Halpern JY, editors. Probabilistic and Causal Inference: The Works of Judea Pearl. vol. 36. New York, NY, USA: Association for Computing Machinery; 2022. p. 640–6. 10.1145/3501714.3501743Search in Google Scholar
[5] Tian J, Pearl J. Probabilities of causation: bounds and identification. Annals Math Artif Intell. 2000;28:287–313. 10.1023/A:1018912507879Search in Google Scholar
[6] Dawid AP, Musio M, Murtas R. The probability of causation. Law Probability Risk. 2017;16:163–79. 10.1093/lpr/mgx012Search in Google Scholar
[7] Mueller S and Pearl J. Personalized decision making - a conceptual introduction. Technical Report R-513, Department of Computer Science. Los Angeles, CA: University of California; 2022. Search in Google Scholar
[8] Pearl J. Causes of effects and effects of causes. J Sociol Meth Res. 2015:44:149–64. 10.1177/0049124114562614Search in Google Scholar
[9] Mueller S, Li A, Pearl J. Causes of effects: Learning individual responses from population data. In: De Raedt L, editor. Proceedings of the Thirty-First International Joint Conference on, IJCAI-22. International Joint Conferences on Organization. July 2022. p. 2712–8. 10.24963/ijcai.2022/376Search in Google Scholar
[10] Pearl J. Interpretation and identification of causal mediation. Psychol Meth. 2014;19:459–81. 10.1037/a0036434Search in Google Scholar PubMed
[11] Pearl J. Causality: models, reasoning, and inference. 2nd edition. New York: Cambridge University Press; 2009. 10.1017/CBO9780511803161Search in Google Scholar
[12] Balke A, Pearl J. Bounds on treatment effects from studies with imperfect compliance. J Am Statist Assoc. September 1997;92(439):1172–6. 10.1080/01621459.1997.10474074Search in Google Scholar
[13] Pearl J. Comment: graphical models, causality, and intervention. Statist Sci. 1993;8(3):266–9. 10.1214/ss/1177010894Search in Google Scholar
[14] Joffe MM, Yang WP, Feldman HI. Selective ignorability assumptions in causal inference. Int J Biostatist. 2010;6(2):Article 11. 10.2202/1557-4679.1199.Search in Google Scholar PubMed
[15] Pearl J. Myth, confusion, and science in causal analysis. Technical Report R-348. Los Angeles, CA: University of California; 2009. Search in Google Scholar
[16] Wright S. The relative importance of heredity and environment in determining the piebald pattern of guinea-pigs. In: Proceedings of the National Academy of Sciences of the United States of America. Vol. 6. 1920. p. 320–32. 10.1073/pnas.6.6.320Search in Google Scholar PubMed PubMed Central
[17] Haavelmo T. The statistical implications of a system of simultaneous equations. Econometrica 1943;11:1–12. 10.2307/1905714Search in Google Scholar
[18] Pearl J. Trygve Haavelmo and the emergence of causal calculus. Econometric Theory. 2015;31:152–79. Reprinted in D.F. Hendry and M.S. Morgan (Eds.), The Foundations of Econometric Analysis, Cambridge University Press, 477-490, 1995. 10.1017/S0266466614000231Search in Google Scholar
[19] Pearl J. The causal foundations of structural equation modeling. In: Hoyle RH, editor. Handbook of structural equation modeling. New York: Guilford Press; 2012. p. 68–91. Second edition forthcoming, January 2023. 10.21236/ADA557445Search in Google Scholar
[20] Pearl J. On the first law of causal inference. Causal Analysis in Theory and Practice, 29 Nov. 2014, http://causality.cs.ucla.edu/blog/index.php/2014/11/29/on-the-first-law-of-causal-inference/.Search in Google Scholar
[21] Pearl J. On the interpretation of do(x). J Causal Infer. 2019;7(1):20192002. 10.1515/jci-2019-2002Search in Google Scholar
© 2022 Judea Pearl, published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Editorial
- Causation and decision: On Dawid’s “Decision theoretic foundation of statistical causality”
- Research Articles
- Simple yet sharp sensitivity analysis for unmeasured confounding
- Decomposition of the total effect for two mediators: A natural mediated interaction effect framework
- Causal inference with imperfect instrumental variables
- A unifying causal framework for analyzing dataset shift-stable learning algorithms
- The variance of causal effect estimators for binary v-structures
- Treatment effect optimisation in dynamic environments
- Optimal weighting for estimating generalized average treatment effects
- A note on efficient minimum cost adjustment sets in causal graphical models
- Estimating marginal treatment effects under unobserved group heterogeneity
- Properties of restricted randomization with implications for experimental design
- Clarifying causal mediation analysis: Effect identification via three assumptions and five potential outcomes
- A generalized double robust Bayesian model averaging approach to causal effect estimation with application to the study of osteoporotic fractures
- Sensitivity analysis for causal effects with generalized linear models
- Individualized treatment rules under stochastic treatment cost constraints
- A Lasso approach to covariate selection and average treatment effect estimation for clustered RCTs using design-based methods
- Bias attenuation results for dichotomization of a continuous confounder
- Review Article
- Causal inference in AI education: A primer
- Commentary
- Comment on: “Decision-theoretic foundations for statistical causality”
- Decision-theoretic foundations for statistical causality: Response to Shpitser
- Decision-theoretic foundations for statistical causality: Response to Pearl
- Special Issue on Integration of observational studies with randomized trials
- Identifying HIV sequences that escape antibody neutralization using random forests and collaborative targeted learning
- Estimating complier average causal effects for clustered RCTs when the treatment affects the service population
- Causal effect on a target population: A sensitivity analysis to handle missing covariates
- Doubly robust estimators for generalizing treatment effects on survival outcomes from randomized controlled trials to a target population
Articles in the same Issue
- Editorial
- Causation and decision: On Dawid’s “Decision theoretic foundation of statistical causality”
- Research Articles
- Simple yet sharp sensitivity analysis for unmeasured confounding
- Decomposition of the total effect for two mediators: A natural mediated interaction effect framework
- Causal inference with imperfect instrumental variables
- A unifying causal framework for analyzing dataset shift-stable learning algorithms
- The variance of causal effect estimators for binary v-structures
- Treatment effect optimisation in dynamic environments
- Optimal weighting for estimating generalized average treatment effects
- A note on efficient minimum cost adjustment sets in causal graphical models
- Estimating marginal treatment effects under unobserved group heterogeneity
- Properties of restricted randomization with implications for experimental design
- Clarifying causal mediation analysis: Effect identification via three assumptions and five potential outcomes
- A generalized double robust Bayesian model averaging approach to causal effect estimation with application to the study of osteoporotic fractures
- Sensitivity analysis for causal effects with generalized linear models
- Individualized treatment rules under stochastic treatment cost constraints
- A Lasso approach to covariate selection and average treatment effect estimation for clustered RCTs using design-based methods
- Bias attenuation results for dichotomization of a continuous confounder
- Review Article
- Causal inference in AI education: A primer
- Commentary
- Comment on: “Decision-theoretic foundations for statistical causality”
- Decision-theoretic foundations for statistical causality: Response to Shpitser
- Decision-theoretic foundations for statistical causality: Response to Pearl
- Special Issue on Integration of observational studies with randomized trials
- Identifying HIV sequences that escape antibody neutralization using random forests and collaborative targeted learning
- Estimating complier average causal effects for clustered RCTs when the treatment affects the service population
- Causal effect on a target population: A sensitivity analysis to handle missing covariates
- Doubly robust estimators for generalizing treatment effects on survival outcomes from randomized controlled trials to a target population