Some Considerations on the Back Door Theorem and Conditional Randomization

Julieta Molina; Lucio Pantazis; Mariela Sued

doi:10.1515/em-2013-0018

Article Publicly Available

Some Considerations on the Back Door Theorem and Conditional Randomization

Julieta Molina , Lucio Pantazis and Mariela Sued

Published/Copyright: November 7, 2014

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Epidemiologic Methods Volume 3 Issue 1

Abstract

In this work, we propose a different “surgical modified model” for the construction of counterfactual variables under non-parametric structural equation models. This approach allows the simultaneous representation of counterfactual responses and observed treatment assignment, at least when the intervention is done in one node. Using the new proposal, the d-separation criterion is used to verify conditions related with ignorability or conditional ignorability, and a new proof of the back door theorem is provided under this framework

Keywords: causal inference; graphical methods; back door theorem

1 Introduction

The main objective of this short report is to discuss the relationship between the back door theorem (Pearl 2000, 79) and the conditional randomization (or exchangeability) assumption. We will relate these two concepts through the d-separation rule, constructing both counterfactual variables and observed treatments in the same graph.

During the last years, several authors have been concerned with this problem. The twin directed acyclic graphs (DAGs), presented by Balke and Pearl (1994), allow simultaneous construction of observed and counterfactual variables.

Recently, Richardson and Robins (2013) presented a graphical theory based on single world intervention graphs (SWIGs), unifying causal directed graphs and potential outcomes. The present study, less ambitious, can be considered as a complementary work focusing in particular on the back door theorem, one of the most popular criteria to identify the distribution of counterfactual variables. We can present our results in a simple way, accessible to those who may be nonexperts in the mathematical technicalities, but still familiar with the field and with non-parametric structural equation models (NPSEM) (Pearl 2000, ch. 7). In this setting, Pearl has proposed a modified NPSEM where potential outcomes are defined by replacing the equations related to the treatment nodes by the constants corresponding to the desired intervention. Therefore, the observed treatment assignment and counterfactual variables do not occur together, neither in the model for the observed data nor in the modified model.

We propose here a new modified NPSEM model containing both treatments and counterfactual variables. Unlike the case of the twin graph, variables in this new model factorize according to the back door theorem graph, namely the DAG, where arrows emerging from nodes associated with the intervention are removed. We use this fact to prove that, for univariate treatment, the graphical assumptions of Pearl’s back door theorem (Pearl 2000, 79) imply conditional exchangeability. In this way, we establish a new proof of Pearl’s back door theorem and thus of identifiability of the mean of the counterfactual variables, a proof which can be understood by both proponents of the counterfactual and graphical approaches to causality.

It should be said that even if Richardson and Robins’ graphs differ from the graphs in this paper, our perspectives are rather similar. One difference is that, although they assume an underlying NPSEM, they do not assume independence of the disturbances (as we do). Instead, they prefer to work with the Finest Fully Randomized Causally Interpretable Structured Tree Graphs model (FFRCISTG), introduced in Robins (1986), but in the sense of Definition 2 (p. 23) presented in Richardson and Robins (2013).

In this work, we start assuming NPSEM with independent errors (NPSEM-IE). Although NPSEM-IE are less general than FFRCISTG models, we decided to present our proposal under this setting, considering that these models are commonly used. However, in Section 4 we show that all the results presented in this work remain valid if FFRCISTG models are assumed instead.

This work is organized as follows. In Section 2 we present a simple example with a three-node DAG, explaining the main idea to construct jointly the observed treatment assignment and counterfactual variables. We check in this example that the assumptions of the back door theorem imply conditional randomization. In Section 3 we generalize these results, first for the case of an intervention on one node, and then for many nodes. In Section 4 we discuss our results in the framework of FFRCISTG models.

To conclude this introduction, we would like to establish a subtle difference frequently omitted. Given a DAG G, we use V={V1,V2,…,Vn} to denote the nodes of a graph, whereas random variables associated with a given node Vi are denoted by Vi or by some perturbation of Vi, like Vi,t or Vit, as it will be explained later on.

2 Toy example – main idea

In the potential outcome framework (Rubin 1974), the identifiability of the average treatment effect is guaranteed under the assumption of conditional randomization (or ignorability). It states that there exists a vector L of observed variables such that Ya and A are independent given L, for a=t,c, where A is a binary treatment variable taking values in {t,c} while Ya denotes the potential outcomes under treatment level a. More precisely, Ya is the outcome variable that would have been observed in a hypothetical world in which all individuals received treatment level a. Denoting by Y the observed outcome and assuming that it satisfies Y=YtIA=t+YcIA=c, we get that the average treatment effect (ATE=E[Yt]−E[Yc]) is identified by the distribution of observed data (L,A,Y) by the formula ATE=EE[Y|A=t,L]−EE[Y|A=c,L].

The d-separation criterion (Pearl 2000, 18) is a graphical tool designed to check independence and conditional independence between coordinates (or sub-vectors) of a random vector whose distribution satisfies the Markov factorization with respect to a given DAG. Then, one is tempted to use such a tool to decide whether conditional ignorability can be assumed for the problem under consideration, studying the DAG associated with it.

For those who are familiar with DAGs, the back door theorem is a famous result used to identify the distribution of the counterfactual variables, and its assumptions give rise to the same formula presented under conditional exchangeability for identifying the average treatment effect. So, we asked ourselves whether the graphical conditions required by the back door theorem allow to prove conditional exchangeability using the d-separation criterion. To answer this question, we need to construct both counterfactual and treatment in the same DAG, in particular in the DAG involved in the back door theorem. To do so, we propose a simple modification to the approach presented by Pearl to define counterfactual variables. In the coming example, we outline the basic idea of our construction, which is generalized in the following section.

Assume that the causal diagram associated with the problem of interest is given by the DAG G (Figure 1).

Figure 1

The original DAG G

In terms of NPSEM (Pearl 2000, ch. 7 or 2009) this means that there exists a set of functions F={fL,fA,fY} and jointly independent disturbances U={UL,UA,UY}, which give rise to factual variables according to the following recursive system:

[1]L=fL(UL), A=fA(L, UA), Y=fY(L, A, UA).

We use M=(F,U) to denote the model which defines the factual variables. To emulate the intervention do(a), Pearl (2000, ch. 7 or 2009) considers a model Ma where the function fA is replaced by the constant a, while the disturbances remain unchanged: Ma=(Fa,U), with Fa={fa,L,fa,A,fa,Y}, where

[2]fa,L=fL,fa,A=a,fa,Y=fY.

The variables obtained iterating the functions in model Ma using the same vector of disturbances U = {UL,UA, UY} are denoted with the subindex a: La, Aa and Ya. In this way, the counterfactual response of interest at level a is given by Ya.

Our proposal to represent counterfactual variables consists in the use of a new system of functions, in which the value a is inserted in lieu of the variable corresponding to the node A, every time this one is required by the recursion. To do so we change the functions related to each node having A as parent. In the present example, Ma=(Fa,U), with Fa={fLa,fAa,fYa}, where

[3]fLa=fLfAa=fA,fYa(ℓ,u)=fY(ℓ,a,u).

Note that this new set of functions is compatible with the DAG GA_, where arrows emerging from A are removed (Figure 2).

$Figure 2 GA_$${G_{\underline{\bf A}}}$$, constructed removing in G arrows emerging from A$$\bf A$$$

Figure 2

GA_, constructed removing in G arrows emerging from A

The variables constructed iterating the functions in Fa and using the same vector of disturbances {UL,UA,UY} are denoted by the supraindex a: La, Aa and Ya. Then, we get that the distribution of (La,Aa,Ya) is compatible with GA_.

The following Lemma and Corollary summarize the main results of this section.

Lemma 11. La=La=L, Aa=AandYa=Ya.

2. AandYare d-separated byLinGA_and so, since the distribution of(La,Aa,Ya)is compatible withGA_, we get thatAais independent ofYagivenLa.

Corollary 2For the causal DAG given in Figure 1, we get that A is independent ofYagiven L. Thus, conditional randomization holds.

3 Intervention with constant regimes

3.1 Interventions on a single node

Consider a causal DAG G with nodes V1,…,Vn, labeled in a compatible way with G. Recall that in the graph terminology, we say that Vi is a parent of Vj if an arrow points from Vi to Vj. We use PAG(Vj) to denote the set of parents of Vj in G. If Vi has a directed path to Vk we say that Vi is an ancestor of Vk, and use AnG(Vk) to denote the set of ancestors of Vk in G.

Consider a collection of independent random variables U={U1,…,Un}. Let Vi denote the common support of any random variable associated with the node Vi and let Ui denote the support of Ui. A set of functions F={fi:i≥1} is said to be compatible with G, if for each i=1,…n, we get that

[4]fi: ∏vj∈PAG(vi)Vj×Ui→Vi.

Given a set F={fi:i≥1} of compatible functions with G, and independent U={U1,…,Un}, factual variables are defined by the recurrence

Vi=fi(PAi,Ui),

where PAi are the random variables (already defined by the recurrence) associated with the nodes in PAG(Vi). Note that, by construction, the distribution of (V1,…,Vn) is compatible with G, meaning that it satisfies the Markovian factorization induced by G. We use M=(F,U) to denote the model that gives rise to factual variables.

In order to represent an intervention at level a for a given node A, Pearl (2000, ch. 7 or 2009) defined the “Surgically modified model” Ma=(Fa,U), considering Fa={fa,i:i≥1}, where fa,i=fi if Vi/=A and for Vj=A, fa,j=a. Counterfactual variables are defined by this new set of functions and the same disturbances {U1,…,Un}, by the recurrence

Va,i=fa,i(PAa,i,Ui),

where PAa,i are the random variables (already defined by the recurrence) associated with the nodes in PAG(Vi).

Before presenting our proposal for constructing counterfactual variables, recall that given a DAG G and a node A in G, GA_ is the graph obtained by removing from G all arrows emerging from A. We will now introduce a new set of functions Fa={fia:i≥1}, compatible with GA_, which will allow the simultaneous definition of both the observed assignment random variable A associated with the node A and the counterfactual responses. To achieve this, if A∉PAG(Vi) we get that PAGA_(Vi)=PAG(Vi) and define fia being equal to fi. When A∈PAG(Vi), fia is obtained by fixing the value a in the original function fi. To be more precise, if A∈PAG(Vi), without loss of generality, it holds

[5]fi: ∏vj∈PAG(vi)\AVj×A×Ui→Vi.

where A denotes the set of possible values which can be assumed by the variables associated with node A. Since PAGA_(Vi)=PAG(Vi)∖A and Fa should be compatible with GA_, we need fia to satisfy the following condition:

[6]fi: ∏vj∈PAG(vi)\AVj×Ui→Vi.

Then, for vˉi∈∏Vj∈PAG(Vi)∖AVj, u∈Ui, we define

fia(v¯i,u)=fi(v¯i,a,u).

Let Via denote the variables obtained by the recurrence based on these new functions:

Via=fia(PAia,Ui),

where PAia are the random variables (already defined by the recurrence) associated with the nodes in PAGA_(Vi). Note that the distribution of (V1a,…,Vna) is compatible with GA_. Let Ma=(Fa,U).

The following Lemma explains how variables defined under models M, Ma and Ma are related.

Lemma 3The random variables associated with both modified modelsMa=(Fa,U)andMa=(Fa,U)are the same, with the exception of those associated with nodeA:

Vi,a=Via if Vi≠A.

Variables associated with the nodeAdefined byM=(F,U)andMa=(Fa,U), respectively, are equal:

A=Aa.

Moreover, ifViis not a descendent ofA, we get that

Vi=Va,i=Via.

Finally, under the assumption that theUiare mutually independent, the joint distribution of the vector(V1a,…,Vna)factors according toGA_(i.e. the variables are Markov with respect toGA_).

To conclude this section, we state the back door theorem, which was originally presented in Pearl (1993) and can be found, as most of the results presented in this work, in Pearl (2009). A new proof of this result is provided.

Theorem 4 The Back Door CriterionConsider a set of nodesL⊂{V1,…,Vn}, such thatL∩A=∅. Assume that the following conditions hold:

1. No element ofLis a descendent ofAin G,

2. Lblocks all back door paths fromAtoYin G.

Then, Yais independent of A given L and so

P(Ya=y)=∑ℓPY=y|A=a,L=ℓP(L=ℓ).

Proof: To prove that conditional ignorability holds, meaning that Ya is independent of A given L, we note that under the assumption of Theorem 4, considering the results presented in Lemma 3, we get that

If no element of L is a descendent of A in G, then L=La=La.
If L blocks all back door paths from A to Y in G, then A and Y are d-separated by L in GA_, and so Aa and Ya are independent given La.

Finally, resorting again to the results stated in Lemma 3, we also know that Aa=A and Ya=Ya. So, if $L satisfies both conditions [1] and [2], we can conclude that Ya is independent of A given L. This means that conditional ignorability holds, as we meant to prove. Thus, the distribution of the counterfactual variables can be identified by the formula

P(Ya=y)=∑ℓPY=y|A=a,L=ℓP(L=ℓ).

□

3.2 Interventions on multiple nodes

Assume now that we wish to intervene in a set of nodes Aset={A1,…,Ak}. Consider ai∈Ai, where Ai denotes the support of variables associated with node Ai, and let a=(a1,…,ak). Following the new surgically modified model, we will change the functions related to those nodes whose parents include some Aj.

As in the one node case, given a DAG G, let M=(F,U) denote the model (compatible with G) for factual variables (V1,…,Vn). Let (Va,1,…,Va,n) denote the vector of variables determined by the model Ma=(Fa,U) proposed by Pearl, with Fa={fa,i:i≥1}, where fa,i=fi if Vi does not belong to the set Aset, and when Vj=Ai for some i, fa,j=ai.

We will now generalize our construction presented for single node intervention in this new scenario. To do so, we consider Ma=(Fa,U), for Fa={fia:i≥1}, compatible with GAset_, the graph obtained removing in G all arrows emerging from the set Aset. Note that the set of parents of a given node Vi in GAset_ is obtained by eliminating from the set of parents of Vi in the original DAG G, all nodes in Aseti=Aset∩PAG(Vi); namely, we have that PAGAset_(Vi)=PAG(Vi)∖Aseti. Therefore, the definition of fia depends on whether the set Aseti is empty or not. Now, if Aseti=∅, we get that PAGAset_(Vi)=PAG(Vi) and we define fia=fi. When Aseti/=∅, we can assume that

[7]fi:∏Vj∈PAG(Vi)∖AsetiVj×∏Aj∈AsetiAj×Ui⟶Vi,

and consider

[8]fia:∏Vj∈PAG(Vi)∖AsetiVj×Ui⟶Vi,

where for vˉi∈∏Vj∈PAG(Vi)∖AsetiVj, u∈Ui, we define

fia(v¯i,u)=fi(v¯i,ai,u),

including in ai all the coordinates of the vector a=(a1,…,ak) corresponding to the set Aseti: ai=(aj:Aj∈Aseti). In other words, when PAG(Vi)∩Aset/=∅, each time the value of the variable related to the node Aj is required by the original function fi (meaning that Aj∈Aseti), we construct the function fia fixing in fi the value aj.

Let (V1a,…,Vna) denote the vector of variables obtained by the recurrence based on these new functions (Fa) and disturbances U. Once more, we get that the distribution of (V1a,…,Vna) is compatible with GAset_. The results are presented in what follows.

Lemma 5LetA=(A1,…,Ak)andAa=(A1a,…,Aka)denote the random variables related to the nodesA1,…,Ak, according to model M andMa, respectively. IfW∩Aset=∅, then the following version of the consistency assumption holds:

{Aa=a, Wa=w}={A=a, W=w}.

The random variables associated with both modified modelsMa=(Fa,U)andMa=(Fa,U)are the same, with the exception of those associated with nodes in Aset:

Vi,a=ViaifVi/∈Aset.

Under the assumption that theUiare mutually independent, the joint distribution of the vector(V1a,…,Vna)factors according toGAset_(i.e. the variables are Markov with respect toGAset_).

Finally, we include a new proof of the back door theorem, using the independences deduced from its assumptions and Lemma 5.

Theorem 6 Back Door Criterion: Many NodesConsider a set of nodes

L⊂{V1,…,Vn}, such thatL∩Aset=∅. Assume that the following conditions hold:

1. No element ofLis a descendent ofAset,

2. Lblocks all back door paths fromAset to Yin G.

Then,

P(Ya=y)=∑ℓPY=y|A=a,L=ℓP(L=ℓ),

witha=(a1,…ak).

Proof: Under the present assumptions we get that

If no element of L is a descendent of Aset, then L=La=La.
If L blocks all back door paths from Aset to Y in G, then Aset and Y are d-separated by L in GAset_, and so Aa and Ya are independent given La.

Finally, if L satisfies the previous conditions, by Lemma 5, we get that {Aa=a,La=ℓ}={A=a,L=ℓ} for any ℓ, {Ya=y,Aa=a,La=ℓ}={Y=y,A=a,L=ℓ} for any (ℓ,y) and so (under positivity),

P(Ya=y)=P(Ya=y)=∑ℓP(Ya=y|La=ℓ)P(La=ℓ)=∑ℓP(Ya=y|La=ℓ,Aa=a)P(La=ℓ)=∑ℓP(Y=y|A=a,L=ℓ)P(L=ℓ).

□

4 FFRCISTG models

In the previous results, we used the rules of d-separation to detect independence or conditional independence between variables of a random vector. To do so, given a graph G, all we required from the joint distribution of our vector was compatibility with G. When variables are constructed following a NPSEM-IE, the Markov factorization induced by G holds automatically, and that is why our results are valid when the errors are independent.

However, the Markov factorization remains true under weaker conditions. For instance, let v=(v1,…,vn)∈∏j=1nVj and call vpaG(Vi) the subvector of v containing the coordinates related to the nodes in the set PAG(Vi), namely vpaG(Vi)=(vj:Vj∈PAG(Vi)). If

[9]fivpaG(Vi),Ui:Vi∈G are independent,forallv∈∏j=1nVj,

then, the distribution of the vector whose variables are constructed with M=(F,U) is compatible with the graph G. This condition mainly defines the FFRCISTG models (Richardson and Robins 2013).

It is worth noting that if M=(F,U) satisfies condition [9] relative to G, the intervened model Ma=(Fa,U), defined in Section 3.2, also satisfies condition [9] relative to GA_set, since

fiavpaGA_set(Vi),Ui:Vi∈GA_set=fivpaG(Vi)a,Ui:Vi∈G

where vpaG(Vi)a denotes the vector that results from replacing vj with aj for {j:Aj∈Aset}. Then, the distribution of the variables constructed using the model Ma, requiring that the errors satisfy only condition [1], factors according to the graph GA_set, allowing the use of d-separation rules, and thus extending our results to this new model.

Acknowledgments

We are grateful to the referees and, especially to the editor, for their comments that have largely contributed to improve the original manuscript.

This research was partially supported by grants from the University of Buenos Aires, the National Council for Science and Technology of Argentina (CONICET) and the National Agency for Scientific Promotion (ANPCyT).

References

Balke, A., and Pearl, J. (1994). Probabilistic evaluation of counterfactual queries. UCLA Cognitive Systems Laboratory, Technical Report (R-213-A). In: Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA, Volume I, 230–237.Search in Google Scholar

Pearl, J. (1993). Comment: Graphical models, causality and intervention. Statistical Science, 8:266–269.Search in Google Scholar

Pearl, J. (2000). Causality. Cambridge: Cambridge University Press.Search in Google Scholar

Pearl, J. (2009). Causal inference in statistics: An overview. Statistics Surveys, 3:96–146. URL http://ftp.cs.ucla.edu/pub/stat_ser/r350.pdf10.1214/09-SS057Search in Google Scholar

Richardson, T., and Robins, J. (2013). Single World Intervention Graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality. Working Paper Number 128, Center for Statistics and the Social Sciences University of Washington.Search in Google Scholar

Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period – Application to control of the healthy worker survivor effect. Mathematical Modelling, 7:1393–1512.10.1016/0270-0255(86)90088-6Search in Google Scholar

Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66:688–701.10.1037/h0037350Search in Google Scholar

Published Online: 2014-11-7

Published in Print: 2014-12-1

Articles in the same Issue

https://doi.org/10.1515/em-2013-0018

Keywords for this article

causal inference; graphical methods; back door theorem