Home Mathematics Sufficient Causes: On Oxygen, Matches, and Fires
Article Publicly Available

Sufficient Causes: On Oxygen, Matches, and Fires

  • Judea Pearl EMAIL logo
Published/Copyright: September 26, 2019

Abstract

We demonstrate how counterfactuals can be used to compute the probability that one event was/is a sufficient cause of another, and how counterfactuals emerge organically from basic scientific knowledge, rather than manipulative experiments. We contrast this demonstration with the potential outcome framework and address the distinction between causes and enablers.

1 Introduction

This note illustrates the use of structural models in counterfactual reasoning. In particular, it demonstrates the computation of a quantity denoted PS – the probability of sufficiency, which plays an important role in commonsense reasoning, as well as in legal and medical applications [1], [2], [3], [4].

To motivate the analysis, we will use the classical example of Oxygen, Matches, and Fire which The Book of Why [5, p. 289] describes as follows:

“A fire broke out after someone struck a match, and the question is ‘What caused the fire, striking the match or the presence of oxygen in the room?’ Note that both factors are equally necessary, since the fire would not have occurred absent one of them. So, from a purely logical point of view, the two factors are equally responsible for the fire. Why, then, do we consider lighting the match a more reasonable explanation of the fire than the presence of oxygen?”

The intuitive explanation invokes the notion of prevalence, or anticipation:

“The person who lit the match ought to have anticipated the presence of oxygen, whereas nobody is generally expected to pump all the oxygen out of the house in anticipation of a match-striking ceremony.”

This intuition can also be captured by the notion of sufficiency; striking a match is more likely to be sufficient for the fire than the presence of oxygen. The language of counterfactuals permits us to make this distinction precise as follows: For any two variables, X and Y, define a quantity PS as “the probability that event X=1 would be sufficient to producing outcome Y=1.” Using parenthetical counterfactual notation Y(X=1) to denote “the value that Y would attain had X been 1,” PS can be written as [2, Definition 9.2.1]:

(1)PS=P[Y(X=1)=1|X=0,Y=0].

In words, PS asks us to imagine a situation where X=0 and Y=0 and to test how likely it is for Y to turn into Y=1 if X were to change (counterfactually) from X=0 to X=1. Eq. (1) thus quantifies the capacity of X to produce an outcome Y=1 in situations where the outcome is absent. The reason that we must quantify this hypothetical event with probabilities is that both X and Y are random variables, subjected to the whims of unknown factors, some creating situations in which X produces Y, and some creating other situations where X does not produces Y. Eq. (1) quantifies production over all situations, weighted by their likelihood.

We will now compute PS for each of Oxygen and Match and compare their magnitudes. We start by specifying a structural causal model (SCM) for the variables

F=Fire,M=Match,OX=Oxygen

assuming that F responds to M and OX through the logical ‘AND’ function

(2)F=1ifOX=1andM=10otherwise

Additionally, we assume that, prior to observing the fire, the probabilities for M and OX were:

(3)P(OX=1)=poxP(M=1)=pm

with poxpm, since match-lighting is a rare event and the presence of oxygen is common.

We are now set to derive the probability of sufficiency (Eq. (1)) for both Oxygen and Match using a three-step procedure developed in [2, p. 206]. But before presenting this derivation, it is important that we step back and understand the significance of this exercise. Note that we are about to derive a counterfactual expression, Eq. (1), from a model that is totally void of such expressions. Instead, the model depicts the science behind the fire story in the form of a Boolean function, Eq. (2), and two probabilities, Eq. (3), that can be estimated from the data. This stands in sharp contrast to conventional methods of estimating counterfactual quantities in the potential outcome framework which, invariably, start with counterfactual assumptions justified by drawing analogies to treatments assignments, or “well-defined” manipulations in controlled randomized experiments [6], [7], [8], [9], [10], [11].

There is nothing resembling treatments or experimental manipulation in the function of Eq. (2). One can, of course envision a variety of experiments on the process described in Eq. (2) but those would be conducted to interrogate the process, not to define it. The process itself is specified independently of any envisioned manipulations. See [5, pp. 144–150] for discussion of how experiments interrogate Nature, rather than define it.

This difference between causal models and manipulation-based models is essential for understanding the significance of the exercise described in this note. We will assess counterfactual quantities (Eq. (1)) directly from Nature (Eq. (2)) without asking an investigator to translate Nature into a set of counterfactual statements, prior to commencing the analysis. This we now demonstrate using the three-step procedure derived in [2, p. 206]. 1. Abduction, 2. Action, and 3. Prediction.

Metaphorically, these steps call for: 1. Updating history in light of the available evidence, 2. Bending the course of history (minimally) to comply with the antecedent, and 3. Predicting the outcome based on the updated past and modified model.

2 Formal derivation

Problem: Compute PS(M) and PS(OX), where (from (1)):

PS(OX)=P[F(OX=1)=1|OX=0,F=0]PS(M)=P[F(M=1)=1|M=0,F=0]

Assumptions:

P(OX=1)=pox,P(M=1)=pm,

The model is given by the graph G below, where U1 and U2 represent unobserved factors which affect OX and M, respectively. For simplicity, we will assume these factors to be independent, as shown in Fig. 1.

Figure 1 A graph G representing the structural model of Eq. (2) driven by two unobserved factors, U1{U_{1}} and U2{U_{2}}.
Figure 1

A graph G representing the structural model of Eq. (2) driven by two unobserved factors, U1 and U2.

We shall now derive PS(OX) and PS(M) by applying the three-step algorithm to the model of Fig. 1.

  1. Abduction: We need to update the prior probabilities pox and pm in light of the evidence F=0. This amounts to computing pox and pm for model G, in which F is known to be False (F=0) (the situation prior to observing fire, as in Fig. 2.

    Derivation:

    pox=P(OX=1|F=0)=P(OX=1,F=0)/P(F=0)==P(OX=1,M=0)/1P(OX=1,M=1)=pox(1pm)/(1poxpm)pm=P(M=1|F=0)=P(M=1,F=0)/P(F=0)==P(M=1,OX=0)/1P(OX=1,M=1)=pm(1pox)/(1poxpm)

    For pm1 and pox1 we obtain:

    (4)poxpoxandpmpm
    Figure 2 A graph G′{G^{\prime }} representing the model of Eq. (2) in the absence of fire.
    Figure 2

    A graph G representing the model of Eq. (2) in the absence of fire.

    The reason is clear; the updated priors are simply the old priors re-normalized, after excluding the event F=1, which is very rare. An identical result holds therefore when U1 and U2 are dependent (see Appendix A).

  2. Action: To compute PS(M), we take the updated model of Fig. 2 and simulate the action do(M=1). This results in the graph GM=1, of Fig. 3:

    Figure 3 A graph GM=1{G_{M=1}} representing the simulated action do(M=1)do(M=1) on the updated model of Fig. 2, yielding P(F=1)=pox′P(F=1)={p^{\prime }_{ox}}.
    Figure 3

    A graph GM=1 representing the simulated action do(M=1) on the updated model of Fig. 2, yielding P(F=1)=pox.

    Similarly, to compute PS(OX), we simulate the action do(OX=1), leading to graph GOX=1, of Fig. 4.

    Figure 4 A graph GOX=1{G_{OX=1}} representing the simulated action do(OX=1)do(OX=1) on the updated model of Fig. 2, yielding P(F=1)=pm′P(F=1)={p^{\prime }_{m}}.
    Figure 4

    A graph GOX=1 representing the simulated action do(OX=1) on the updated model of Fig. 2, yielding P(F=1)=pm.

  3. Prediction: To complete the derivation of PS(M), we now compute P(F=1) in GM=1, yielding:

    PS(M)=P(F=1)inGM=1=poxpox

    Likewise, to compute PS(OX), we compute P(F=1) in GOX=1 giving:

    PS(OX)=P(F=1)inGOX=1=pmpm

    Thus, we have

    (5)PS(M)poxandPS(OX)pm

    and PS(M)PS(OX) as expected.

3 Conclusions and related works

The primary purposes of this note have been: (1) To demonstrate that counterfactuals are derivable algorithmically from common scientific knowledge, and are not needed as inputs for causal analysis. (2) To empower researchers with methods of estimating counterfactuals directly from functional description of their problems. We have demonstrated these two capabilities by computing PS, the probability of sufficiency, in the context of the classical Oxygen-Match-Fire example, which is pivotal for understanding causal explanations. Using this computation we obtained a formal confirmation of the intuition that lighting the match is the more plausible cause of the fire, not the presence of oxygen.

A brief historical overview of this problem and previous works towards its solution should help the reader appreciate its context and importance.

The most common conception of causation – that the effect E would not have occurred in the absence of the cause C – goes back to Hume (1748) [12], and captures the notion of “necessary causation.” The probabilistic version of necessary causation (PN) is behind many judicial standards. In tort law, for example, damage should be paid if and only if it is more probable than not that damage would not have occurred but for the defendant action.

But causation has two faces, necessary and sufficient. The distinction between the two was first articulated by John Stuart Mill (1843) [13], and has received semi-formal explications in the 1960s, first using conditional probabilities [14] and then using logical implications [15]. Both explications suffer from basic semantical difficulties, since probabilities and classical logic are too crude to capture the logic of counterfactual conditionals ([16]; [2, pp. 249–256, 313–316]). The popular “Sufficient Component” model of Kenneth Rothman [17] is essentially equivalent to Mackie’s “INUS condition” and inherits the semantical difficulties noted in [16]. Nevertheless, the graphical schematics of Rothman’s “causal pies” were found very effective in teaching epidemiologists how to represent interacting causes as Boolean functions in disjunctive form. Additionally, counterfactual interpretations of Rothman’s model (VanderWeele and Hernán [18] have resolved some of its semantical difficulties. In particular, these interpretations restrict variables from entering the sufficient cause model unless they are parents of the outcome variable in the causal diagram, as depicted in Fig. 1.

Robins and Greenland [3] gave a counterfactual definition for the probability of necessary causation taking counterfactuals as primitives, and assuming that one is in possession of a joint probability function over counterfactual events. Pearl [19] gave definitions for the probabilities of necessary or sufficient causation (or both) based on structural model semantics which, as we have seen in this note leads to effective procedures for computing counterfactuals from a given causal theory [20], [21]. Additionally, this semantics can be characterized by a complete set of axioms [22], [23], which can be used as inference rules in the analysis.

Pearl [19] and Tian and Pearl [24] have derived tight bounds on PS and PN when both observational and experimental data are available. A tool kit for solving counterfactual parameters is given in [25, pp. 116–126].

Our derivation of PS also bears on a recent debate concerning the role of non-manipulable variables in causal inference, specifically, whether variables such as sex or race can be considered “causes” [26], [27]. In our example, oxygen is practically non-manipulable, and yet, the structural model of Fig. 1 treats oxygen and match on equal footing, with oxygen serving as an enabler of fire (see Appendix B). The model further allows for the estimation of the counterfactuals PS(OX) and PS(M) by the same three-step procedure, regardless of how manipulable they are. Such counterfactuals are considered “not well-defined” in the orthodox school of potential outcome, an untenable stance that would prohibit our question “what caused the fire” from being asked, let alone being answered.

Appendix A The importance of the abductive step, from interventions to counterfactuals

The infinitesimal probability of no oxygen, (1pox)1, led to the approximate equalities

poxpoxandpmpm

which may give readers the impression that the abduction step is superfluous, and that we could have gotten Eq. (5) directly, by computing the causal effects P[Y(OX=1)=1] and P[Y(M=1)=1] instead of Eq. (1). Indeed, intervening to secure oxygen in the house will have very low probability pm of resulting in fire, and intervening to light a match will result in fire with high probability, pox. To appreciate the importance of the abduction step: let us compute PS for a hypothetical scenario in which pm and pox are determined by two independent fair coins, resulting in pox=pm=1/2.

The causal effects in this case would compute to

(6)P[Y(OX=1)=1]=P[Y(M=1)=1]=1/2

because once we assure the presence of oxygen fire will occur 50 % of the time, when a match is struck. Conversely, once a match is struck, fire will occur 50 % of the time, when oxygen is present.

However, the probability of actually producing fire in situation where fire is initially absent is in fact lower than 1/2. Going through the abduction exercise, we get

pm=P(M=1|F=0)=P(U2=1|U2=0orU1=0)=1/3pox=P(OX=1|F=0)=P(U1=1|U2=0orU1=0)=1/3

and, accordingly, the probabilities of sufficiency become:

PS(M)=PS(OX)=1/3

lower than the causal effects in (6).

Much wider difference between pm and pm will obtain if we let U1 affect U2 in a significant way. For example, let U1 be a fair coin and let U2 track U1. The marginal probabilities of OX and M will remain the same, pm=pox=1/2. and the causal effects, likewise, will be the same as in Eq. (6). However, the posterior probabilities will be vastly different, yielding pm=pox=0, because both M=1 and OX=1 must be false in any situation where F=0. Accordingly, the probabilities of sufficiency must both vanish

PS(M)=PS(OX)=0

as we can see from Figs. 3 and 4, using pm=pox=0. Indeed, prior to the fire, either U1 or U2 must be absent, but since they track each other, both must be absent, so lighting a match will not trigger a fire.

What we see in this example is a profound difference between the information we obtain from interventional studies and that obtained from counterfactual analysis. Interventional studies tell us that striking a match raises the probability of fire from zero to 50, while counterfactual analysis tells us that, knowing that currently the fire is off, had we struck a match it could not possibly have triggered a fire. Symmetrically, the same holds for the hypothetical action: “Had we secured the presence of oxygen.” This retrospective information cannot be obtained from interventional studies however elaborate.

The extra information that enables us to compute PS requires the specification of the functional relationships between the variables involved (as in Eq. (2)) as well as the distribution of the unobserved error terms U1 and U2. These two specifications elevates SCM to the top level (rung 3) of the Ladder of Causation [5] which supports counterfactuals. Lacking any of these two, as in the potential outcomes framework, or in Causal Bayesian Networks [2, Sec. 1.3.1.] may allow us to evaluate causal effects, but not counterfactuals. Using Causal Bayesian Networks, for example, one can estimate the effects of all possible actions, including compound actions and action conditioned on observed covariates and, yet, none can capture the retrospective aspect of counterfactuals and infer “What if we had done things differently?”

This theoretical separation between interventions and counterfactuals has not been accepted by all analysts. It is absent for example from the taxonomy used in [11], from the potential outcome framework [9], as well as from most work on Reinforcement Learning (RL) [28].

The temptation in RL is to argue: If we can conclude (from interventional studies) that action A1 tends to bring about a reward R and action A2 tends to inhibit that reward, why can’t we assert counterfactually, after acting A2 and failing to achieve R, that “had we acted A1 we would have gotten R?” This line of reasoning may work in the deterministic case, that is, when the reward R is a deterministic function of the actions A1 and A2, but not when it is averaged over a population or over unmeasured factors.

For an extreme yet simple example that proves the fallacy of drawing counterfactuals from interventional studies consider a guessing game where a player wins a dollar upon guessing the outcome of a fair coin and losing a dollar otherwise. The action “guess head” clearly has no effect on the expected outcome, neither has the action “guess tail.” Both result in a 50 % chance of winning. Yet upon winning a dollar a player can safely assert: “Had I acted differently I would have lost” ([2, p. 295]; [29]). The extents to which experimental and observational studies can inform counterfactual probabilities are delineated in [24] and [2, p. 294].

Appendix B Causes vs. enablers

Epidemiologists reading this article will note that the analysis of PS may confer causal power onto variables that are merely “effect modifiers” but not genuine “causes.” Indeed, in ordinary epidemiological conversions oxygen would be classified as an effect modifier, not as a cause of fire. So will variables such as humidity, atmospheric pressure and wind velocity. They are perceived to be assisting or hindering the fire, not causing it. From a chemical viewpoint however the opposite is true; fire is a process of oxidation, hence oxygen is an active agent in the process, while match striking merely creates a local rise in temperature which is an enabling condition, not an active cause of fire. If we further look at the logical function defining the process, Eq. (2), we find total symmetry. Moreover, examining Rothman’s “pie diagrams” which many epidemiologists consider a faithful depiction of their conceptual framework, we find each of Match and Oxygen labeled a “sufficient cause component” in a 2-component pie

{Oxygen,Match}.

What then governs the distinction between “cause” and “effect modifier” or “enabler” in epidemiology? Is it the manipulability of the former, or the higher PS measure that the former earns from prevalence considerations? I believe both considerations contribute to the distinctions and, certainly, we should not refrain from calling a nonmanipulable effect modifier “a cause,” if its PS value justifies the name.

Effect modifiers, contrary to opinions of some epidemiologies [26] do have well defined causal effects, defined by the do-operator and the model in which they are embedded. The same goes to notions such as confounding and mediation. Whatever property the model bestows upon a manipulated variable it also bestows upon an effect-modifier, since the two are not marked differently in the model. The interpretation of such causal effects may not translate into policies that directly manipulate these modifiers, yet they enter the evaluation of policies that control the presence of these modifiers so as to regulate their consequences [27].

Lastly, it is interesting to note that the capacity of an event X=1 to produce an outcome Y=1 can be uncovered directly from the structural equation model. We can proclaim X=1 a “producer” of Y=1 iff there exists a context C such that

Y(X=0,C)=0andY(X=1,C)=1.

For example, each of M=1 and OX=1 is a producer of F=1 in the model of Eq. (2), because OX=1 serves as a fire-enabling context for M=1, and M=1 serves as a fire-enabling context for OX=1. Events M=0 and OX=0 cannot be producers of F=1 since no enabling contexts exist.

One may be tempted to surmise that the property of production coincides with the presence of an event as a component in Rothman’s “sufficient component model.” But this is not the case. Consider the 3-pie model:

{A=0,B=1,C=1},{A=1,B=1},{A=1,B=0,C=0}

Event A=0 appears in the first pie and, yet, it is not a producer of Y=1 because no context exists which would make Y switch from 0 to 1 as A switches from 1 to 0. The same is true for B=0 which appears in the 3rd pie. All other events however are producers of Y=1. For example, C=0 is a producer of Y=1 because the context {A=1,B=0} will see Y switch from Y=0 to Y=1 as C changes from C=1 to C=0.

References

1. Tian J, Pearl J. A general identification condition for causal effects. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press/The MIT Press; 2002. p. 567–73.Search in Google Scholar

2. Pearl J. Causality: Models, Reasoning, and Inference. 2nd ed. New York: Cambridge University Press; 2000. 2009.10.1017/CBO9780511803161Search in Google Scholar

3. Robins JM, Greenland S. The probability of causation under a stochastic model for individual risk. Biometrics. 1989;45:1125–38.10.2307/2531765Search in Google Scholar

4. Pearl J. Causes of effects and effects of causes. J Sociol Methods Res. 2015;44:149–64.10.1177/0049124114562614Search in Google Scholar

5. Pearl J, Mackenzie D. The Book of Why: The New Science of Cause and Effect. New York: Basic Books; 2018.Search in Google Scholar

6. Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66:688–701.10.1037/h0037350Search in Google Scholar

7. Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period – applications to control of the healthy workers survivor effect. Math Model. 1986;7:1393–512.10.1016/0270-0255(86)90088-6Search in Google Scholar

8. Angrist JD, Pischke J-S. Mastering ‘Metrics: The Path from Cause to Effect. Princeton: Princeton University Press; 2014.Search in Google Scholar

9. Imbens GW, Rubin DB. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge, MA: Cambridge University Press; 2015.10.1017/CBO9781139025751Search in Google Scholar

10. Morgan S, Winship C. Counterfactuals and Causal Inference: Methods and Principles for Social Research (Analytical Methods for Social Research). 2nd ed. New York, NY: Cambridge University Press; 2015.10.1017/CBO9781107587991Search in Google Scholar

11. Hernán MA, Robins JM. Causal Inference. Boca Raton: Chapman & Hall/CRC; 2019. Forthcoming.Search in Google Scholar

12. Hume D. An Enquiry Concerning Human Understanding. 1748. Reprinted: LaSalle, IL: Open Court Press; 1958.10.1093/oseo/instance.00032980Search in Google Scholar

13. Mill JS. System of Logic. vol. 1. London: John W. Parker; 1843.Search in Google Scholar

14. Good IJ. A causal calculus (I). Br J Philos Sci. 1961;11:305–18.10.1093/bjps/XI.44.305Search in Google Scholar

15. Mackie JL. Causes and conditions. Am Philos Q. 1965;2(4):261–4. Reprinted in Sosa E, Tooley M, editors. Causation. London: Oxford University Press; 1993.Search in Google Scholar

16. Kim J. Causes and events: Mackie on causation. J Philos. 1971;68:426–71. Reprinted in Sosa E, Tooley M, editors. Causation. London: Oxford University Press; 1993.10.2307/2025175Search in Google Scholar

17. Rothman KJ. Causes. Am J Epidemiol. 1976;104:587–92.10.1093/oxfordjournals.aje.a112335Search in Google Scholar PubMed

18. VanderWeele TJ, Hernán MA. From counterfactuals to sufficient component causes and vice versa. Eur J Epidemiol. 2006;21:855–8.10.1007/s10654-006-9075-0Search in Google Scholar PubMed

19. Pearl J. Probabilities of causation: Three counterfactual interpretations and their identification. Synthese. 1999;121:93–149.10.1023/A:1005233831499Search in Google Scholar

20. Balke A, Pearl J. Probabilistic evaluation of counterfactual queries. In: Proceedings of the Twelfth National Conference on Artificial Intelligence, vol. I. Menlo Park, CA: MIT Press; 1994. p. 230–7.10.1145/3501714.3501733Search in Google Scholar

21. Balke A, Pearl J. Counterfactuals and policy analysis in structural models. In: Besnard P, Hanks S, editors. Uncertainty in Artificial Intelligence 11. San Francisco: Morgan Kaufmann; 1995. p. 11–8.Search in Google Scholar

22. Galles D, Pearl J. An axiomatic characterization of causal counterfactuals. Found Sci. 1998;3:151–82.10.1023/A:1009602825894Search in Google Scholar

23. Halpern JY. Axiomatizing causal reasoning. In: Cooper G, Moral S, editors. Uncertainty in Artificial Intelligence. San Francisco, CA: Morgan Kaufmann; 1998. p. 202–10. Also, J Artif Intell Res. 2000;12(3):17–37.Search in Google Scholar

24. Tian J, Pearl J. Probabilities of causation: Bounds and identification. Ann Math Artif Intell. 2000;28:287–313.10.1023/A:1018912507879Search in Google Scholar

25. Pearl J, Glymour M, Jewell NP. Causal Inference in Statistics: A Primer. New York: Wiley; 2016.Search in Google Scholar

26. Hernán MA, Taubman SL. Does obesity shorten life? The importance of well-defined interventions to answer causal questions. Int J Obes. 2008;32:S8 EP. 10.1038/ijo.2008.82.Search in Google Scholar PubMed

27. Pearl J. Does obesity shorten life? Or is it the soda? On non-manipulable causes. J Causal Infer. 2018;6. 10.1515/jci-2018-2001. Published Online: 2018-08-24.Search in Google Scholar

28. Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: A survey. J Artif Intell Res. 1996;4:237–85. 10.1613/jair.301. arXiv:cs/9605103. Archived from the original on 2001-11-20.Search in Google Scholar

29. Pearl J. The curse of free-will and paradox of inevitable regret. J Causal Infer. 2013;1:255–7.10.21236/ADA557449Search in Google Scholar

Published Online: 2019-09-26
Published in Print: 2019-09-25

© 2019 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 6.1.2026 from https://www.degruyterbrill.com/document/doi/10.1515/jci-2019-0026/html
Scroll to top button