Home Spillover detection for donor selection in synthetic control models
Article Open Access

Spillover detection for donor selection in synthetic control models

  • Michael O’Riordan EMAIL logo and Ciarán M. Gilligan-Lee
Published/Copyright: October 8, 2025
Become an author with De Gruyter Brill

Abstract

Synthetic control (SC) models are widely used to estimate causal effects in settings with observational time-series data. To identify the causal effect on a target unit, SC requires the existence of additional units that are not impacted by the intervention. Given one of these potential donor units, how can we decide whether it is in fact a valid donor – that is, one not subject to spillover effects from the intervention? Such a decision typically requires appealing to strong a priori domain knowledge specifying the units, which becomes infeasible in situations with large pools of potential donors. In this article, we introduce a practical, theoretically grounded donor selection procedure, aiming to weaken this domain knowledge requirement. Our main result is a theorem that yields the assumptions required to identify donor values at post-intervention time points using only pre-intervention data. We show how this theorem – and the assumptions underpinning it – can be turned into a practical method for detecting potential spillover effects and excluding invalid donors when constructing SCs. Importantly, we employ sensitivity analysis to formally bound the bias in our SC causal estimate in situations where an excluded donor was indeed valid, or where a selected donor was invalid. Using ideas from the proximal causal inference and instrumental variables literature, we show that the excluded donors can nevertheless be leveraged to further debias causal effect estimates. Finally, we illustrate our donor selection procedure on both simulated and real-world datasets.

MSC 2020: 62D20; 62A01; 68T37; 60A99

1 Introduction

The ability to estimate causal effects is of fundamental importance in many domains, including medicine, economics, and industry [110]. In the absence of experimental data from randomized controlled trials or A/B tests, practitioners are faced with observational (non-randomized) data, and must rely on the assumptions, tools, and techniques of causal inference to estimate the impact of interventions. Synthetic control (SC) models, first introduced more than 20 years ago by Abadie and Gardeazabal [11], are widely used to estimate treatment effects in settings with observational time-series (panel) data [e.g. 1215] and have been described by Athey and Imbens [16] as “arguably the most important innovation in the policy evaluation literature in the last 15 years.”

To estimate the impact of an intervention on a target unit, SC requires time-series data for the target unit as well as time-series data for additional units, often called donors. The target and donors should be influenced by common latent factors, and so donors are typically selected based on correlations between the target and donor time-series. Crucially, selected donors must not be impacted by the intervention. That is, there must be no spillover effects from the intervention to the donors. The SC method uses pre-intervention data to construct a SC unit that matches the pre-intervention target as closely as possible. The post-intervention evolution of this SC unit estimates the evolution of the target unit in a counterfactual world where the intervention did not occur, all else being equal. Therefore, the causal impact of the intervention can be estimated by comparing the observed, factual, post-intervention target to the counterfactual one.

Given time-series data for a potential donor, how can we determine that it is not impacted by the intervention, and hence constitutes a valid donor that can be used in the construction of SCs? Such a decision typically requires appealing to strong a priori domain knowledge about the nature of the intervention and donors. Usually, we must already know (or assume) that the entire pool of potential donors is not subject to spillover effects from the intervention. However, in many real-world applications, the pool of potential donors can be very large, such as when estimating the impact of a new feature on a large online platform [17, Section 5], and domain knowledge alone is unlikely to be adequate for donor selection due to the scale of the problem.

In this work, we aim to relax the domain knowledge requirements by introducing a practical, theoretically grounded donor selection procedure. This procedure can be used to augment partial knowledge about invalid donors, thereby allowing us to rely on a weaker form of domain knowledge than usually required to select valid donors. We also relate the failure modes of our selection procedure to recent advances in the sensitivity analysis frameworks for SC [18] and negative control [19], giving formal bounds on the bias for a given selection of donors, and further reducing the burden on domain knowledge to select a perfect set of valid donors.

Our main result is a theorem – based on techniques from proximal causal inference – that provides the assumptions required to identify and forecast values of specific donors at post-intervention time points using pre-intervention donor data only. This is in contrast to the standard SC identifiability, which additionally uses post-intervention donor values to predict the post-intervention counterfactual for the target unit. The main assumptions required for a given donor’s post-intervention values to be identified are that the donor is not impacted by the intervention, and the distribution of the underlying data generating mechanism in the past is representative of future data points. Therefore, if we use pre-intervention data to predict post-intervention data and obtain the wrong answer, either the underlying latent distribution has changed, the donor has been impacted by spillover effects from the intervention, or both.

We use this result to detect potential spillover effects and exclude donors when constructing SCs. Importantly, we formally bound the potential bias introduced by this selection procedure due to false positives (excluding donors not impacted by spillover effects) and false negatives (including donors impacted by spillover effects) using sensitivity analysis. While the excluded donors are not used in constructing the SC, we show how they can still be leveraged to further debias causal effect estimates in situations where the donors are noisy proxies of the latent dynamics [20].

We conclude the article by providing an empirical demonstration of our donor selection procedure on both simulated and real-world datasets.

In summary, our main contributions are as follows:

  • We prove a theorem that yields the assumptions required to identify donor values at post-intervention time points using only pre-intervention data.

  • We introduce a practical donor selection procedure using this theorem, that detects potential spillover effects and excludes invalid donors when constructing SCs.

  • We use sensitivity analysis to formally bound the bias in our SC causal estimate for a given selection of donors in situations where an excluded donor was indeed valid, or where a selected donor was invalid.

  • We provide a two-stage method that uses the excluded donors to further debias causal effect estimates.

  • We illustrate the performance of our donor selection procedure on both simulated and real-world datasets.

2 Related work

2.1 Identifiability of SCs

Historically, SC identifiability relied on assuming that the data generating process can be modelled as a latent linear factor model. With this assumption, the counterfactual is identified as a linear combination of valid donors. For instance, see literature [1113], and extensions of these approaches utilizing Bayesian structural time-series by Brodersen et al. [14].

More recently, Shi et al. [21] argued that linearity emerges in a non-parametric manner if the target and donor units are in fact aggregations of “smaller” units (e.g. country-level data are aggregates of individual-level behaviours). The need for this “aggregate unit” assumption was subsequently removed by Shi et al. [20] and Zeitler et al. [18], who leveraged proximal causal inference to prove that the counterfactual can be non-parametrically identified as a (potentially non-linear) function of valid donors.

Zeitler et al. [18] introduced invariant causal mechanisms as a necessary condition for SC non-parametric identifiability. We define this condition in Section 3.1, and make use of it throughout the rest of the article. How does this relate to previous works on SC? The original SC formulation of Abadie et al. [12] assumes that the data generating process follows a latent linear factor model with time-independent factor loadings. Time-independence of the factor loadings is a special case of the more general invariant causal mechanism assumption employed here. Shi et al. [20] do not explicitly assume that causal mechanisms are invariant, however they do assume the existence of a time-independent confounding bridge function mapping donor outcomes to target outcomes. Zeitler et al. [18] showed that the time-independence of this confounding bridge function can in fact be understood as a consequence of invariant causal mechanisms.

2.2 Proximal causal inference

Proximal causal inference was initially investigated by Kuroki and Pearl [22] and Miao et al. [23], and has been further developed by Tchetgen et al. [24]. It has been used, for instance, in long-term causal effect estimation by Imbens et al. [25], and formulated in terms of the graphical causal inference framework by Shpitser et al. [26]. In the context of SC models, Shi et al. [20] and Zeitler et al. [18] employed proximal causal inference to prove non-parametric identifiability.

Recently, Liu et al. [27] proposed a novel SC model leveraging special donors called “surrogates,” which can include potentially invalid donors. These surrogates differ from the usual donors in that they are correlates of the causal effect instead of the target outcome. For example, in the simple case where the causal effect is a linear function of some latent variable w t such that τ t = γ w t + ε τ t , then an observed variable s t = ρ w t + ε s t is a valid surrogate. Assuming the existence of such surrogates (as well as the requisite domain knowledge to identify them), Liu et al. [27] demonstrated SC estimation based on post-intervention data alone.

2.3 Sensitivity analysis

Later in the article, we discuss scenarios where our donor selection procedure fails, and relate these to recent works on sensitivity analysis in SC [18] and negative control [19]. Sensitivity analysis in the causal inference literature has mainly been focused on investigating omitted variable bias in propensity-based models. This line of work originated in the studies of Rosenbaum and Rubin [28] and Imbens [29], with modern formulations provided by previous studies [3032].

the studies of Rosenbaum and Rubin [28] and Imbens [29]

In the context of SC models, sensitivity analysis has been investigated by Zeitler et al. [18] and Nazaret et al. [33]. The work by Zeitler et al. [18] explored a formal framework for sensitivity to violations of identifiability of the full SC model, in particular, sensitivity to the existence of relevant latent variables with no observed proxies. Nazaret et al. [33], on the other hand, explored mis-specifications to the standard linearity assumption in SC models. Both of these works assumed valid donors (i.e. no donors impacted by spillover effects from the intervention).

Miao [19] investigated sensitivity analysis in the context of negative control (widely used in epidemiological research, negative control variables can be viewed as specific proxy types within the proximal causal inference framework [24,34]). In particular, Miao [19] introduced positive control outcomes (which are reminiscent of invalid donors in SC models), and discussed treating the (unknown) spillover effects as sensitivity parameters to evaluate the plausibility of causal estimates.

3 Methods

3.1 SC structural causal model (SCM)

In Definition 3.1  (first provided by Zeitler et al. [18]), we formally define SC models in the SCM framework of Pearl [35].

Figure 1 
                  (a) SC DAG. Grey nodes are observed variables, white latent. The intervention 
                        
                           
                           
                              I
                           
                           I
                        
                      is applied at time point 
                        
                           
                           
                              t
                           
                           t
                        
                     , and is taken to be 0 for all time points before 
                        
                           
                           
                              t
                           
                           t
                        
                     , and 1 for all time points from 
                        
                           
                           
                              t
                           
                           t
                        
                      onwards. The noise terms for the target 
                        
                           
                           
                              Y
                           
                           Y
                        
                     , donors 
                        
                           
                           
                              X
                           
                           X
                        
                     , and latents 
                        
                           
                           
                              U
                           
                           U
                        
                      have been suppressed for ease of exposition. Note that autocorrelation in the target time-series between 
                        
                           
                           
                              
                                 
                                    y
                                 
                                 
                                    t
                                    −
                                    1
                                 
                              
                           
                           {y}^{t-1}
                        
                      and 
                        
                           
                           
                              
                                 
                                    y
                                 
                                 
                                    t
                                 
                              
                           
                           {y}^{t}
                        
                     , or between 
                        
                           
                           
                              
                                 
                                    x
                                 
                                 
                                    i
                                 
                                 
                                    t
                                    −
                                    1
                                 
                              
                           
                           {x}_{i}^{t-1}
                        
                      and 
                        
                           
                           
                              
                                 
                                    x
                                 
                                 
                                    i
                                 
                                 
                                    t
                                 
                              
                           
                           {x}_{i}^{t}
                        
                      in the time-series for donor 
                        
                           
                           
                              
                                 
                                    x
                                 
                                 
                                    i
                                 
                              
                           
                           {x}_{i}
                        
                      arises due to the causal links between the latents at different time points. As discussed in Appendix A, the model and results can be readily generalized to the case where the evolution depends on 
                        
                           
                           
                              T
                           
                           T
                        
                      time points. (b) Donor forecast. At time point 
                        
                           
                           
                              t
                              −
                              1
                           
                           t-1
                        
                     , the donors 
                        
                           
                           
                              
                                 
                                    x
                                 
                                 
                                    1
                                 
                                 
                                    t
                                    −
                                    1
                                 
                              
                              ,
                              
                                 …
                              
                              ,
                              
                                 
                                    x
                                 
                                 
                                    N
                                 
                                 
                                    t
                                    −
                                    1
                                 
                              
                           
                           {x}_{1}^{t-1},\ldots ,{x}_{N}^{t-1}
                        
                      are proxies for the latents 
                        
                           
                           
                              
                                 
                                    u
                                 
                                 
                                    1
                                 
                                 
                                    t
                                    −
                                    1
                                 
                              
                              ,
                              
                                 …
                              
                              ,
                              
                                 
                                    u
                                 
                                 
                                    M
                                 
                                 
                                    t
                                    −
                                    1
                                 
                              
                           
                           {u}_{1}^{t-1},\ldots ,{u}_{M}^{t-1}
                        
                     , which allows us to write 
                        
                           
                           
                              
                                 
                                    x
                                 
                                 
                                    i
                                 
                                 
                                    t
                                 
                              
                           
                           {x}_{i}^{t}
                        
                      as a function of 
                        
                           
                           
                              
                                 
                                    x
                                 
                                 
                                    1
                                 
                                 
                                    t
                                    −
                                    1
                                 
                              
                              ,
                              
                                 …
                              
                              ,
                              
                                 
                                    x
                                 
                                 
                                    N
                                 
                                 
                                    t
                                    −
                                    1
                                 
                              
                           
                           {x}_{1}^{t-1},\ldots ,{x}_{N}^{t-1}
                        
                      and the noise terms at time 
                        
                           
                           
                              t
                           
                           t
                        
                     . The noise terms at time 
                        
                           
                           
                              t
                              −
                              1
                           
                           t-1
                        
                      have been suppressed for ease of exposition.
Figure 1

(a) SC DAG. Grey nodes are observed variables, white latent. The intervention I is applied at time point t , and is taken to be 0 for all time points before t , and 1 for all time points from t onwards. The noise terms for the target Y , donors X , and latents U have been suppressed for ease of exposition. Note that autocorrelation in the target time-series between y t 1 and y t , or between x i t 1 and x i t in the time-series for donor x i arises due to the causal links between the latents at different time points. As discussed in Appendix A, the model and results can be readily generalized to the case where the evolution depends on T time points. (b) Donor forecast. At time point t 1 , the donors x 1 t 1 , , x N t 1 are proxies for the latents u 1 t 1 , , u M t 1 , which allows us to write x i t as a function of x 1 t 1 , , x N t 1 and the noise terms at time t . The noise terms at time t 1 have been suppressed for ease of exposition.

Definition 3.1

SC structural causal models (SCSCMs) consist of a set of M latent variables U and their distributions, a set of N observed variables X representing the donor units,a set of observed variables Y, I representing the target unit, and the intervention, and a set of deterministic functions mapping parents to their children in the causal structure in Figure 1a, represented as a directed acyclic graph (DAG), each indexed by a specific time point t , such that

  • u j t = m j t ( u j t 1 , ε u j t )

  • x i t = f i t ( u 1 t , , u M t , ε x i t )

  • y t = g t ( u 1 t , , u M t , I t , ε y t )

where ε u j t P ( ε u j t ) , ε x i t P ( ε x i t ) , and ε y t P ( ε y t ) are independent, exogenous error terms.

For simplicity, we sometimes follow [36] and suppress the functional dependence on the exogenous error terms for the latents u 1 , , u M . We also often drop the indices on the donors and latents as follows: x t x 1 t , , x N t , u t u 1 t , , u M t , ε x t ε x 1 t , , ε x N t , and ε u t ε u 1 t , , ε u M t .

This definition of SCSCMs generalizes the standard latent linear factor model formulation of SCs [12]. In particular, notice that the target y t and donors x t can be arbitrary functions of the latents u t . We sometimes also use y t and x t to denote the values these variables take; however, the difference will be clear from the context. Note that autocorrelation in the target and donor time-series arises due to the causal links between the latents at different time points. As discussed in Appendix A, the definition of SCSCMs and our results can be readily generalized to the case where evolution depends on T time points.

We finish this section by defining the proxy variable completeness condition (Definition 3.2), and invariant causal mechanisms (Definition 3.3), which are necessary for SC non-parametric identifiability, and form the basis for our Theorem 3.1.

The following completeness condition formally defines when a set of donors can be considered proxies for a set of latent variables [18,24].

Definition 3.2

(Completeness condition) For any square-integrable function f , if E ( f ( x 1 t , , x N t ) u 1 t , , u M t ) = 0 , then f ( x 1 t , , x N t ) = 0 for any t .

At a given time point, the completeness condition characterizes how much “information” the donors have about the latent variables, in the sense that any variation in the u ’s is captured by variation in x ’s. For the rest of the article, we assume that the set of donors x 1 t , , x N t can be treated as proxies for the latents u 1 t , , u M t .

A causal mechanism is the deterministic function that uniquely specifies a variable from its parents in the causal graph and is equivalent to the conditional distribution of that variable given its (latent and observed) parents. We assume that causal mechanisms are invariant (Definition 3.3).

Definition 3.3

(Invariant causal mechanism) A causal mechanism is invariant if it does not depend on time point t .

In practice, this assumption means that we drop the time index from the causal mechanisms in the structural equations (Definition 3.1) such that u j t = m j ( u j t 1 , ε u j t ) , x i t = f i ( u 1 t , , u M t , ε x i t ) , and y t = g ( u 1 t , , u M t , I t , ε y t ) .

The SCM formulation allows us to define (strong) interventions via the do-operator, disconnecting the intervened variable from its parents in the causal graph and assigning to it a specific value [35]. Thus, we quantify the impact of an intervention I on the target y at time t as

(1) τ = E ( y t do ( I t = 1 ) , I t = 1 ) Observed E ( y t do ( I t = 0 ) , I t = 1 ) Counterfactual

The first term is the observed, factual, post-intervention target, whereas the second term is the unobserved, counterfactual one. For SCSCMs, Zeitler et al. [18] showed that if causal mechanisms are invariant, and if the donors are proxies for the latents, then the counterfactual is identified via a unique function h of the (valid) donors such that E ( y t do ( I t = 0 ) , I t = 1 ) = E ( h ( x t , I t = 0 ) ) .

3.2 Spillover detection for selecting valid donors

The power of SC lies in the fact that the counterfactual in equation (1) can be identified even when there are post-intervention shifts in the exogenous errors P ( ε u t ) for the latents u . However, to identify the counterfactual we still need to specify donor units x that are valid. A donor is valid if it adheres to the DAG depicted in Figure 1a, and remains a proxy for the latents (Definition 3.2) at all time points. In particular, it must not be impacted by spillover effects from the intervention. Given an SCSCM from Definition 3.1, spillover effects manifest as post-intervention shifts in the donor errors P ( ε x t ) .

Usually, one must appeal to strong a priori domain knowledge in order to select valid donors. Can we leverage data from the donor pool itself, augmenting incomplete domain knowledge, to gain confidence that a potential donor is valid? Ideally, for a given donor candidate x i , we would like to be able to use pre-intervention data to test for shifts in P ( ε x i t ) that would rule out x i as a valid donor. Such a procedure would allow us to select donors based on a weaker form of domain knowledge than that required to know exactly which donors adhere to the DAG. This issue of selecting valid donors is distinct from the usual considerations of selecting donors based on pre-treatment fit [37], and is in fact a prerequisite.

In Section 3.2.1, we provide a theorem showing that the assumptions necessary for SC non-parametric identifiability – the proxy variable completeness condition (Definition 3.2), and invariant causal mechanisms (Definition 3.3) – also facilitate forecasting donor values given additional assumptions. We use this theorem to introduce a practical, theoretically grounded donor selection procedure based on detecting potential spillover effects to identify invalid donors.

3.2.1 Donor selection procedure: Theory

Theorem 3.1

If causal mechanisms are invariant, and the donors x 1 t 1 , , x N t 1 are proxies for the latents u 1 t 1 , , u M t 1 then, for each donor x i , there exists a unique function h i such that for all time points t we have:

(2) E ( x i t ) = E ( h i ( x 1 t 1 , , x N t 1 , P ( ε x i t , ε u t ) ) )

The proof of this theorem can be found in Appendix A. Intuitively, because x i t is a function of the latents at time point t 1 , and because the donors at t 1 are proxies for those latents, we can swap the latents for the donors and write x i t as a function of x 1 t 1 , , x N t 1 and the exogenous error terms at time t (Figure 1b).

We now use Theorem 3.1 to identify and exclude potentially invalid donors. Under conditions that (a) donor x i is valid, and (b) the exogenous error distributions P ( ε u t ) for the latents have not shifted at time t relative to their pre-intervention values, then Theorem 3.1 implies that we can forecast post-intervention values for x i based on pre-intervention donor data alone. Conversely, failing to forecast x i t implies the violation of condition (a), (b), or both. This observation forms the basis of our spillover detection method.

To use this theoretical result to obtain a practical method for flagging invalid donors, we assume that forecast errors are due to the donor unit error terms (due to being impacted by the intervention) and not the latent error terms. Crucially, we assume that failing to forecast x i t means that condition (a) is violated – that P ( ε x i t ) has shifted[1] relative to pre-intervention values – and x i is not a valid donor.

In general, without additional domain knowledge, we cannot be sure that the unpredictability of x i is due to spillover effects and not changes in u j . However, in Section 3.4, we employ sensitivity analysis to formally bound the bias in our SC causal estimate violations to this assumption – i.e. false positives, when we exclude a donor x i even though it was the latent u j that changed, and false negatives, when we include a donor x i even though it was impacted by spillover effects. In particular, when our spillover detection incorrectly flags a donor x i (which is a false positive), then excluding this donor will only introduce bias if the remaining selected donors do not satisfy the completeness condition (i.e. there is omitted variable bias). This sensitivity analysis is the bridge that brings us from the theory of Theorem 3.1 to our practical donor selection method in Algorithm 1.

As we demonstrate in Section 4, violations of condition (b) – shifts in the latents – are only an issue for the selection method if they occur at the same time point as the donor forecast. If the latents shift later in the post-intervention period, this does not bias donor selection, as these later post-intervention data are not used in our selection procedure. Lags between the intervention and spillover effects can be dealt with by forecasting on coarsened, time-averaged donor data[2]. Furthermore, time averaging can reduce false negatives in cases with very noisy donors (Figure 3). However, the longer time windows also increase the risk of false positives due to potential shifts in the latent distributions.

Figure 2 
                     Bias 
                           
                              
                              
                                 
                                    
                                       τ
                                    
                                    
                                       ^
                                    
                                 
                                 −
                                 τ
                              
                              \widehat{\tau }-\tau 
                           
                         and 95% Monte Carlo confidence intervals for 2000 simulated datasets. The data generating process is described in Appendix B. To construct SCs, we must identify the subset of donors that are valid. The horizontal axis shows the procedure used to identify this potentially valid set. For comparison, the case labelled All shows the expected bias of 1.6 when we assume that all donors are valid, and the case labelled Valid shows the bias when we have perfect knowledge about which donors are valid. Our 
                           
                              
                              
                                 S
                                 1
                              
                              S1
                           
                         and 
                           
                              
                              
                                 S
                                 2
                              
                              S2
                           
                         donor selection procedures are described in Section 3.2.1 and Algorithm 1. The standard deviation of the donor noise term, 
                           
                              
                              
                                 
                                    
                                       ε
                                    
                                    
                                       x
                                    
                                    
                                       t
                                    
                                 
                                 
                                 ∼
                                 
                                 N
                                 
                                    (
                                    
                                       0
                                       ,
                                       σ
                                    
                                    )
                                 
                              
                              {\varepsilon }_{x}^{t}\hspace{0.33em} \sim \hspace{0.33em}{\mathcal{N}}(0,\sigma )
                           
                        , increases from left to right. As the noise increases, the spillover detection procedure is more likely to return false negatives, which increases the bias due to invalid donors. Note that even with optimal selection of valid donors, the estimates can still be biased due to donor noise (we address this residual bias in Figure 4). In the high noise case, despite being biased, our procedure still reduces bias relative to the case where we assume that all donors are valid. In this setup, 80% of the donors are invalid, and so our procedure has lower bias than the All case on average when fewer than 80% of the selected donors are invalid.
Figure 2

Bias τ ^ τ and 95% Monte Carlo confidence intervals for 2000 simulated datasets. The data generating process is described in Appendix B. To construct SCs, we must identify the subset of donors that are valid. The horizontal axis shows the procedure used to identify this potentially valid set. For comparison, the case labelled All shows the expected bias of 1.6 when we assume that all donors are valid, and the case labelled Valid shows the bias when we have perfect knowledge about which donors are valid. Our S 1 and S 2 donor selection procedures are described in Section 3.2.1 and Algorithm 1. The standard deviation of the donor noise term, ε x t N ( 0 , σ ) , increases from left to right. As the noise increases, the spillover detection procedure is more likely to return false negatives, which increases the bias due to invalid donors. Note that even with optimal selection of valid donors, the estimates can still be biased due to donor noise (we address this residual bias in Figure 4). In the high noise case, despite being biased, our procedure still reduces bias relative to the case where we assume that all donors are valid. In this setup, 80% of the donors are invalid, and so our procedure has lower bias than the All case on average when fewer than 80% of the selected donors are invalid.

For the rest of the article, we restrict our attention to linear SC models, both because of their ubiquity, and also, more practically, to be able to discuss concrete sensitivity analysis bounds [18,19]. We make no distinction, however, between linear SCs as a consequence of the target and donors being linear functions of the latents, and linearity emerging non-parametrically due to the “aggregate unit” assumption of Shi et al. [21]. For demonstration purposes, we also focus on simple linear models for estimating h i as part of the donor selection procedure, although this is not a strict requirement (even when restricting to linear SC models, and linear sensitivity analysis). In principle, we can swap the linear model in this procedure for any flexible machine-learning model, using, e.g. conformal inference for constructing calibrated prediction intervals, perhaps improving performance (see [20,38], for applications of conformal inference in the context of SC models).

3.2.2 Donor selection procedure: Practical method

In Algorithm 1, we present pseudocode for our spillover detection procedure. The normalization step is important for ensuring that the procedure remains invariant to the scale of the donors.

Algorithm 1 Spillover detection for candidate donor x i
Inputs:
– Training data and labels { x 1 t 1 , , x N t 1 ; x i t } for all pre-intervention time points 1 < t < t
– Test data and label { x 1 t 1 , , x N t 1 ; x i t }
– Posterior predictive interval (PPI) bound ϕ (e.g. 80%)
Outputs:
– Procedure S 1 : Prediction absolute error x i t x ^ i t
– Procedure S 2 : 0 if x i t is inside ϕ PPI, else 1
1: Normalize the data and labels
2: Regress x i t on x 1 t 1 , , x N t 1 t < t to obtain h ^ i
3: Predict x ^ i t , [ x ^ i , t , x ^ i , + t ] = h ^ i ( x 1 t 1 , , x N t 1 ; ϕ )
4: Set A = x i t x ^ i t
5: Set B = 0 if x ^ i , t < x i t < x ^ i , + t , else B = 1
6: Output A for selection procedure S 1
7: Output B for selection procedure S 2

We assume the following linear model for the forecast:

(3) x i t N ρ i + k θ i k x k t 1 , σ x i ,

where 1 < t < t , and the index k runs from 1 to N . Note that the coefficients ρ i and θ i k are the same for different time points, encoding the fact that h i should be independent of time. In Section 4.1, we demonstrate the following two approaches, S 1 and S 2 , for selecting donors based on Algorithm 1, using the regression model in equation (3):

  • S 1 : Select donors with the smallest difference between their actual and predicted values at the time of intervention, i.e. min x i ( x i t ρ i k θ i k x k t 1 ) .

  • S 2 : Select donors with values x i t falling within some specified posterior predictive intervals (e.g. 80%).

Strategy S 1 allows the analyst to specify the number of donors to be selected for inclusion in the SC model. For example, if the analyst requires ten donors for the SC model, then strategy S 1 can be used to select ten donors with the smallest forecast errors. Alternatively, as discussed in Appendix C, the analyst can select more than ten donors using this strategy and then subsequently employ regularization to enforce sparsity in the donor weights. With selection strategy S 2 , the analyst does not specify the number of donors to be selected but instead has greater control of the false positive rate. For example, assuming the posterior predictive intervals (PPIs) are well calibrated, then selecting donors within the 95% PPI bound limits the exclusion rate of valid donors to 5%. As discussed in Section 3.2.1, conformal inference could also be used for constructing calibrated prediction intervals.

3.3 Using excluded donors to debias linear SC models

While the donors excluded by Algorithm 1 are not used in constructing the SC counterfactual, they can still be used to debias SC causal effect estimates, as we now show. The experiments in Section 4 use the following linear model for constructing SCs based on pre-intervention time points:

(4) y t N α + i β i x i t , σ y .

As discussed by Shi et al. [20], in situations where the donors are noisy (imperfect) proxies of the latents, the SC model in equation (4) might fail to estimate consistent donor weights β i , unless we assume perfect pre-treatment fit. Using proximal causal inference [20,24], we can leverage proxies not used in the construction of the SC to debias the estimates even with imperfect pre-treatment fit. In the proximal causal inference literature, this debiasing is typically accomplished with two-stage least squares estimation or the generalized method of moments. For the experiment in Figure 4, we opt to jointly model the target and donors as

(5) y t x 1 t x N t N α + i β i x i t γ 1 + l λ 1 l z l t γ N + l λ N l z l t , Σ ,

where z l are the (potentially invalid) donors excluded by our selection procedure. Such a model can be readily estimated using a probabilistic programming language.

Figure 3 
                  The left panel shows the bias in the high noise case of Figure 2. Our selection procedures 
                        
                           
                           
                              S
                              1
                           
                           S1
                        
                      and 
                        
                           
                           
                              S
                              2
                           
                           S2
                        
                      are severely biased due to invalid donors. The right panel shows where we time average the donor data, in buckets of five time points, before passing through the spillover detection. Note that we do not average pre- and post-intervention data in the same bucket, and we use the original, non-averaged data in the SC model. The averaging reduces false negatives such that the performance is on par with the optimal selection of valid donors. We address the residual bias in Figure 4.
Figure 3

The left panel shows the bias in the high noise case of Figure 2. Our selection procedures S 1 and S 2 are severely biased due to invalid donors. The right panel shows where we time average the donor data, in buckets of five time points, before passing through the spillover detection. Note that we do not average pre- and post-intervention data in the same bucket, and we use the original, non-averaged data in the SC model. The averaging reduces false negatives such that the performance is on par with the optimal selection of valid donors. We address the residual bias in Figure 4.

How is it possible that we can use excluded donors z l , potentially impacted by spillover effects from the intervention, when constructing debiased SC estimates? Note that the excluded donors satisfy the proxy existence condition of Shi et al. [20], such that z l t { y t , x i t } u j t for all pre-intervention times t . The key point allowing us to leverage potentially invalid donors is that we only ever use pre-intervention data from z l , and so the SC estimates are unaffected by their post-intervention dynamics.

To build intuition about the above procedure, consider the following model:

E ( Y U ) = ρ U , E ( X U ) = θ U , E ( Y U ) = β E ( X U ) .

The variable Y is analogous to the target variable, U the latent, and X the donor (i.e. a noisy proxy of the latent U ). We would like to estimate the donor weight β . However, because U is latent, we cannot estimate the expectation E ( X U ) . If we regress Y on X we will not obtain a consistent estimate of β because the errors are correlated with X .

Instead, if we have an additional proxy variable Z , we can rewrite the model as

E ( Y Z ) = ρ E ( U Z ) , E ( X Z ) = θ E ( U Z ) , E ( Y Z ) = β E ( X Z ) .

In this case, Z is observed and so we can estimate the expectation E ( X Z ) on the right hand side, and then obtain a consistent estimate for the donor weight β . One approach for estimation (often used for estimating instrumental variables) is a two-stage estimator where we first regress X on Z to estimate X ˆ (sometimes referred to as a proximal control variable), and then regress Y on X ˆ . The multivariate model in equation (5) effectively combines this two-stage process into a single model (similar models can be used for estimating instrumental variables, see, e.g., [39], Section 23.4).

Figure 4 
                  The left panel shows the bias in the medium noise case of Figure 2. The estimated causal effects are biased even with optimal selection of valid donors. This is because the donors are noisy (imperfect) proxies of the latents, and so conditioning fails to fully close backdoor paths. As described in Section 3.3, we can debias these estimates using proximal causal inference to leverage donors excluded by the selection procedure. The right panel shows that the Valid and 
                        
                           
                           
                              S
                              1
                           
                           S1
                        
                      selection procedures now give unbiased estimates of the causal effect, although there is still some bias with procedure 
                        
                           
                           
                              S
                              2
                           
                           S2
                        
                      due to invalid donors. As in Figure 3, time averaging would reduce the bias in 
                        
                           
                           
                              S
                              2
                           
                           S2
                        
                     .
Figure 4

The left panel shows the bias in the medium noise case of Figure 2. The estimated causal effects are biased even with optimal selection of valid donors. This is because the donors are noisy (imperfect) proxies of the latents, and so conditioning fails to fully close backdoor paths. As described in Section 3.3, we can debias these estimates using proximal causal inference to leverage donors excluded by the selection procedure. The right panel shows that the Valid and S 1 selection procedures now give unbiased estimates of the causal effect, although there is still some bias with procedure S 2 due to invalid donors. As in Figure 3, time averaging would reduce the bias in S 2 .

3.4 Sensitivity analysis

In this section, we discuss the false positive (excluding donors not impacted by spillover effects), and false negative (including donors impacted by spillover effects) failure modes of our donor selection procedure. We note that these failures can happen for multiple reasons including shifts in the latents at the same time as the intervention (as discussed in Section 3.2.1; see also Section 4.1 for an example), as well as misspecification and/or poor fit of the forecast model. Zeitler et al. [18] provided a general framework for sensitivity analysis in non-parametric SC models, bounding the bias when there are relevant latent variables with no observed donors as proxies (omitted variable bias). They also give a formula for calculating these bounds in the case of linear SC models, which we make use of in Sections 3.4.1, and 3.4.2. In the context of negative control, Miao et al. [19] introduced positive control outcome variables (which are similar to invalid donors) and discussed treating the spillover effect as a sensitivity parameter to investigate the plausibility of causal effect estimates. We make use of this approach in Section 3.4.3.

3.4.1 Relevant latents with no observed donors

First, we discuss omitted variable bias due to the absence of observed donors. When there are latent confounding variables with no observed donors to act as proxies, we cannot close all backdoor paths and so our estimated causal effect will be biased. For linear SC models, this bias can be bounded using Eq. (3) from the study of Zeitler et al. [18]. In particular, let x i be the observed donors selected to construct the SC, and β x i be the corresponding donor weights. The potential omitted variable bias due to unobserved donors is then

(6) OV Bias N × max x i ( β x i ) × max x i ( E ( x i pre ) E ( x i post ) )

Note that for this to be a valid bound, we must assume that the observed donors are at least as important as the unobserved ones (in the sense that the maximum weight and post-intervention shift for the observed donors is larger than that of the unobserved donors, [18]). This is similar to assumptions used in propensity-based sensitivity analysis, where to obtain valid bounds one requires that unobserved confounders are at most as important to estimation as the observed confounders [30].

3.4.2 Relevant latents with no selected donors

A false positive result from the spillover detection procedure potentially introduces a related form of omitted variable bias. In particular, if all donors acting as proxies for a relevant latent variable are excluded by the selection procedure, then the resulting SC causal estimate will be biased, in a similar manner to Section 3.4.1. However, in this case, we can bound the bias using weaker assumptions than necessary for equation (6), because the excluded donors are actually observed. Let x i be the donors selected to construct the SC, and z l be the donors excluded by the spillover detection procedure. The potential omitted variable bias due to false positives is then

(7) FP Bias N × max x i ( β x i ) × max z l ( E ( z l pre ) E ( z l post ) ) .

3.4.3 Selected donors impacted by the intervention

Finally, we discuss the potential bias introduced by false negatives. The approach taken here is similar to Section 6 of Miao [19]. Let x i be the donors selected to construct the SC, and τ x i be the corresponding spillover effects from the intervention. The potential bias due to false negatives is then

(8) FN Bias N × max x i ( β x i ) × max x i ( τ x i ) .

We do not know τ x i , but (following, [19]), we can treat it as a sensitivity parameter to gauge the plausibility of our causal effect estimate. For example, if we have domain knowledge bounding τ x i then this can be used in equation (8) to bound the bias. We can also judge how large the spillover effect would have to be in order for the estimated causal effect to change sign.

4 Experiments

In this section, we illustrate the performance of our donor selection procedure in different scenarios using simulated data. We also demonstrate the procedure on real-world data by applying it to semi-synthetic variants of the German Reunification [13] and California Tobacco Control [12] datasets.

4.1 Simulated data

We construct 2,000 simulated datasets according to the data generating process described in Appendix B, with large pools of potential donors, most of which are impacted by spillover effects from the intervention. Each dataset consists of a target timeseries, 10 latents, and a pool of 1,000 potential donors. In each pool, a random set of 80 % of the donors are invalid, impacted by a spillover effect of 2 .

For a given dataset, we must first attempt to identify the subset of valid donors. We refer to the set of donors returned by a selection procedure as the potentially valid donors (PVDs). Next, in order to simplify comparisons between different selection procedures, we estimate the SC using a sample of ten donors from the PVDs. This sampling step ensures that the resulting SC models all have the same number of parameters, regardless of the number of PVDs identified by the different selection procedures[3]. In Appendix C, we compare this approach to one where we estimate SCs using the full set of PVDs, and employ regularization to enforce sparsity in the donor weights. For the data generating process considered here, the expected bias is the same with both approaches. We use the sampling approach to focus the comparisons on the selection of valid donors, rather than also having to consider possible differences due to selecting donors based on pre-treatment fit.

In Figure 2, we show the bias after applying our donor selection procedures S 1 [4] and S 2 . For comparison, the case labelled All shows the bias if we assume that all the donors are valid, and the case labelled Valid shows the optimal scenario if we had perfect knowledge about which donors are valid. Our selection procedures are very close to optimal in both the low and medium donor noise cases. However, the performance degrades as the donor noise becomes comparable in magnitude to the spillover effect. In Figure 3, we address this issue and show how time averaging improves performance when the donors are very noisy. In Figure 4, we leverage excluded donors to further debias effect estimates in situations with noisy donors, as discussed in Section 3.3. Finally, in Figure 5, we show how contemporaneous shifts in the latent distributions can bias our selection procedure, and that these latent shifts are not an issue if they occur later in the post-intervention period.

Figure 5 
                  The left panel shows the bias when the error term for one of the latents shifts at the same time as the intervention. The error term for 
                        
                           
                           
                              
                                 
                                    U
                                 
                                 
                                    1
                                 
                              
                           
                           {U}_{1}
                        
                      shifts from 
                        
                           
                           
                              
                                 
                                    ε
                                 
                                 
                                    
                                       
                                          u
                                       
                                       
                                          1
                                       
                                    
                                 
                              
                              
                              ∼
                              
                              N
                              
                                 (
                                 
                                    0,1
                                 
                                 )
                              
                           
                           {\varepsilon }_{{u}_{1}}\hspace{0.33em} \sim \hspace{0.33em}{\mathcal{N}}(\mathrm{0,1})
                        
                      pre-intervention, to 
                        
                           
                           
                              
                                 
                                    ε
                                 
                                 
                                    
                                       
                                          u
                                       
                                       
                                          1
                                       
                                    
                                 
                              
                              
                              ∼
                              
                              N
                              
                                 (
                                 
                                    0.5,1
                                 
                                 )
                              
                           
                           {\varepsilon }_{{u}_{1}}\hspace{0.33em} \sim \hspace{0.33em}{\mathcal{N}}(\mathrm{0.5,1})
                        
                      post-intervention. Our selection procedure incorrectly flags this as a spillover effect (false positive), thereby excluding donors that depend on 
                        
                           
                           
                              
                                 
                                    U
                                 
                                 
                                    1
                                 
                              
                           
                           {U}_{1}
                        
                     . The right panel shows where this latent shift occurs just after the intervention. In this case our donor selection procedure recovers an unbiased estimate of the causal effect.
Figure 5

The left panel shows the bias when the error term for one of the latents shifts at the same time as the intervention. The error term for U 1 shifts from ε u 1 N ( 0,1 ) pre-intervention, to ε u 1 N ( 0.5,1 ) post-intervention. Our selection procedure incorrectly flags this as a spillover effect (false positive), thereby excluding donors that depend on U 1 . The right panel shows where this latent shift occurs just after the intervention. In this case our donor selection procedure recovers an unbiased estimate of the causal effect.

4.2 Semi-synthetic data

In this section, we further validate our donor selection procedure with using real-world data. In particular, we consider semi-synthetic variants of the 1990 reunification of West and East Germany [13], and the 25 cents tobacco tax increase in California in 1988 [12]. For each dataset, we introduce a semi-synthetic unit to the pool of potential donors, constructed to be a noisy proxy of the target as x syn t N ( y t , σ ) . Being predictive of the targets, these invalid donors receive large weights in the SC models and bias the effect estimates towards zero. This also highlights the critical distinction between selecting valid donors, and the usual considerations of selecting donors based on pre-treatment fit.

We assume SC models of the form y t N i β i x i t , σ y , β i 0 , i β i = 1 , β i Dirichlet ( 0.4 ) , and σ y N + ( 0,1 ) . The restriction to positive donor weights summing to 1 was used in the original models of Abadie et al. [12,13]. We further impose a Dirichlet ( 0.4 ) prior on β i which tends to regularize the weights towards sparse distributions, and we standardize the target and donors such each time-series has zero mean and unit standard deviation in the pre-intervention period. We construct 95% credible intervals by taking the 2.5 and 97.5% percentiles of samples from the posterior predictive distribution. We note that these uncertainty intervals are calculated after the donors have been selected and so do not account for additional uncertainty that the selection procedure introduces. Although beyond the scope of the current article (which focuses on identification of invalid donors), a promising future direction for this work would be to explore how uncertainty due to the donor selection procedure can be addressed using Bayesian selection-adjusted inference [40,41], or frequentist post-selection inference (see, [42], for a recent review).

In Figure 6, we show the estimates for the effect of German reunification on Germany’s per-capita GDP, and in Figure 6, the estimates for the effect of California’s tobacco tax increase on per-capita pack sales. The top panels show the effect estimates after applying our donor selection procedure. In this case, the semi-synthetic, invalid donors are correctly flagged and excluded from the SC models, resulting in causal effect estimates that are consistent with previous studies [12,13]. The SC models in the bottom panels include the semi-synthetic, invalid donors, which results in causal effect estimates much closer to zero than the original findings of previous studies [12,13].

Figure 6 
                  (a) Estimated causal effect of German reunification. The intervention time is indicated by the vertical dotted line, and the shaded areas show the 95% posterior predictive intervals. The pool of 16 potential donors includes a semi-synthetic unit that is a noisy proxy of West Germany. In the top panel, we show the effect estimated after selecting ten donors using our 
                        
                           
                           
                              S
                              1
                           
                           S1
                        
                      selection procedure. The invalid donor is correctly flagged and excluded from the SC model. In this case, the results are very similar to Figure 3 from Abadie et al. [13], which is consistent with their findings about robustness to donor selection. In the bottom panel, we add the semi-synthetic unit to the SC model. This invalid donor receives a large weight, which biases the estimate towards zero. (b) Estimated causal effect of the California tobacco tax. The pool of 39 potential donors includes a semi-synthetic unit that is a noisy proxy of California. In the top panel, we show the effect estimated after selecting 30 donors using our 
                        
                           
                           
                              S
                              1
                           
                           S1
                        
                      selection procedure. The invalid donor is correctly excluded from the SC model, and the resulting effect estimate is very similar to Figure 3 from Abadie et al. [12]. Again, this is consistent with their findings about robustness to donor selection. In the bottom panel, we add the semi-synthetic unit to the SC model which biases the estimate towards zero. These results highlight the critical distinction between selecting valid donors, and the usual considerations of selecting donors based on pre-treatment fit.
Figure 6

(a) Estimated causal effect of German reunification. The intervention time is indicated by the vertical dotted line, and the shaded areas show the 95% posterior predictive intervals. The pool of 16 potential donors includes a semi-synthetic unit that is a noisy proxy of West Germany. In the top panel, we show the effect estimated after selecting ten donors using our S 1 selection procedure. The invalid donor is correctly flagged and excluded from the SC model. In this case, the results are very similar to Figure 3 from Abadie et al. [13], which is consistent with their findings about robustness to donor selection. In the bottom panel, we add the semi-synthetic unit to the SC model. This invalid donor receives a large weight, which biases the estimate towards zero. (b) Estimated causal effect of the California tobacco tax. The pool of 39 potential donors includes a semi-synthetic unit that is a noisy proxy of California. In the top panel, we show the effect estimated after selecting 30 donors using our S 1 selection procedure. The invalid donor is correctly excluded from the SC model, and the resulting effect estimate is very similar to Figure 3 from Abadie et al. [12]. Again, this is consistent with their findings about robustness to donor selection. In the bottom panel, we add the semi-synthetic unit to the SC model which biases the estimate towards zero. These results highlight the critical distinction between selecting valid donors, and the usual considerations of selecting donors based on pre-treatment fit.

5 Conclusion

In this article, we presented a practical, theoretically grounded donor selection procedure for SC models, aimed at weakening the domain knowledge requirements for selecting valid donors. This procedure augments partial knowledge about invalid donors, thereby reducing the burden on the practitioner to explicitly know that a (potentially very large) pool of donors is not impacted by spillover effects from the intervention. Working in the structural causal model framework, we utilized techniques from proximal causal inference to show that the assumptions necessary for SC identifiability also facilitate forecasting post-intervention donor values from pre-intervention data. We used this result to detect potential spillover effects, and exclude invalid donors when constructing SCs. Furthermore, in the context of recent works on sensitivity analysis, we discussed bounding the bias due to false positive and false negative selection errors. We concluded by providing an empirical demonstration of our selection procedure on both simulated and real-world datasets.

  1. Funding information: The authors were funded by Spotify.

  2. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  3. Conflict of interest: The authors state no conflict of interest.

  4. Data availability statement: The methodology to generate the simulated datasets is described in full in Appendix B. The sources for the real datasets are cited in the text.

Appendix A Proof of Theorem 3.1

Theorem

If causal mechanisms are invariant, and the donors x 1 t 1 , , x N t 1 are proxies for the latents u 1 t 1 , , u M t 1 , then, for each donor x i , there exists a unique function h i such that for all time points t we have:

(A1) E ( x i t ) = E ( h i ( x 1 t 1 , , x N t 1 , P ( ε x i t , ε u t ) ) )

Proof

Define x t 1 x 1 t 1 , , x N t 1 , u t 1 u 1 t 1 , , u M t 1 , and ε i t ( ε x i t , ε u t ) for ease of exposition. First, note that we can write the causal mechanism for x i t as

(A2) P t ( x i t u t , ε x i t ) = P t ( x i t u t 1 , ε u t , ε x i t ) = P t ( x i t u t 1 , ε i t ) ,

where the first equality follows from the fact u t is a deterministic function of u t 1 , and ε u t . Now, using the proxy variable completeness condition (Definition 3.2), we can relate the causal mechanism for x i t to x t 1 via a function[A1] H i t as

(A3) P t ( x i t u t 1 , ε i t ) = H i t ( x i t , x t 1 , ε i t ) P t ( x t 1 u t 1 , ε i t ) P t ( x t 1 u t 1 ) d x t 1

This implies that

E ( x i t u t 1 ) = x i t H i t ( x i t , x t 1 , ε i t ) P t ( x t 1 u t 1 ) P ( ε i t ) d ε i t d x i t d x t 1 = P t ( x t 1 u t 1 ) x i t H i t ( x i t , x t 1 , ε i t ) d x i t g i t ( x t 1 , ε i t ) P ( ε i t ) d ε i t d x t 1 = E P ( ε i t ) ( g i t ( x t 1 , ε i t ) ) h i t ( x t 1 , P ( ε i t ) ) P t ( x t 1 u t 1 ) d x t 1 = E ( h i t ( x t 1 , P ( ε i t ) ) u t 1 )

Marginalising over u t 1 yields the result that

E ( x i t ) = E ( h i t ( x t 1 , P ( ε i t ) ) )

Next, we prove that h i t does not depend on time, by showing that the solution to the integral equation in equation (A3) for time point t is also a solution for any other time point t . Remember that causal mechanisms are invariant and so do not depend on t . Consider the left hand side of equation (A3):

P t ( x i t u t 1 , ε i t ) = H i t ( x i t , x t 1 , ε i t ) P t ( x t 1 u t 1 ) d x t 1 .

By equation (A2), P t ( x i t u t 1 , ε i t ) is a causal mechanism and so does not depend on t . Now, consider the P t ( x t 1 u t 1 ) term under the integral on the right-hand side. This distribution Markov factorizes according to the structure of the DAG in Figure 1a, resulting in products of causal mechanisms. Hence, a solution to the integral equation for one time point t is a solution for any other time point.

Finally, we prove uniqueness of h i t for a given time point, as this implies that there exists a unique function h i for all time points. Suppose that h i t and h ˜ i t are both solutions at time t , such that

E ( h i t ( x t 1 , P ( ε i t ) ) u t 1 ) = E ( h ˜ i t ( x t 1 , P ( ε i t ) ) u t 1 ) .

Therefore, E ( h i t ( x t 1 , P ( ε i t ) ) h ˜ i t ( x t 1 , P ( ε i t ) ) u t 1 ) = 0 , and the proxy variable completeness condition (Definition 3.2) implies that h i t ( x t 1 , P ( ε i t ) ) = h ˜ i t ( x t 1 , P ( ε i t ) ) , completing the proof.□

The result can be generalized to the case where the evolution depends on T time points by modifying the SCSCM such that

  • u j t = m j t ( u j t 1 , , u j t 1 T , ε u j t )

  • x i t = f i t ( u 1 t , , u M t , , u 1 t T , , u M t T , ε x i t )

  • y t = g t ( u 1 t , , u M t , , u 1 t T , , u M t T , I t , ε y t )

and updating the proxy variable completeness condition (Definition 3.2) accordingly.

B Simulated data

We construct simulated datasets in Section 4.1 according to the following data generating process (similar to the local linear trend model with long-term slope from, [14]).

u j t + 1 N ( u j t + δ j t , σ u ) δ j t + 1 N ( S j + ρ j ( δ j t S j ) , σ δ ) y t N j α j u j t + τ I t , σ y x i t N ( β i j u j t + τ x i I t , σ x )

We generate 2,000 datasets for each panel in Figure 2. A dataset consists of 1 target y , 10 latents u j , and 1,000 potential donors x i (out of which we select ten donors for constructing SCs). The time-series data have 100 time points pre-intervention and 30 time points post-intervention. The causal effect of the intervention on the target is τ = 2 , and 80% of the potential donors are invalid, with spillover effects τ x i = 2 (the remaining 20% are valid, with τ x i = 0 ). In the above, I t is an indicator variable for the intervention that is 0 pre-intervention and 1 post-intervention. The long-term slope is sampled as S i N ( 0.1 , 0.1 ) , with ρ i U ( 0 , 1 ) interpolating between this slope and a random walk. We set σ u = 1 , σ δ = σ y = 0.1 , and σ x { 0.1 , 0.5 , 1.0 } for the low, medium, and high donor noise levels (the high noise level is comparable in magnitude to τ ). For simplicity, we set the coefficients as α i = 1 , and β i j = 1 (except for the datasets demonstrating the latent shift in Figure 5, where we set β i 1 = 0 for a random subset of valid donors). Figure A1 provides an illustrated example of a target variable generated by the process described above.

Figure A1 
                  Example of a target variable 
                        
                           
                           
                              y
                           
                           y
                        
                      from a simulated dataset. The intervention happens at time 0, increasing 
                        
                           
                           
                              y
                           
                           y
                        
                      by 
                        
                           
                           
                              τ
                              =
                              2
                           
                           \tau =2
                        
                     .
Figure A1

Example of a target variable y from a simulated dataset. The intervention happens at time 0, increasing y by τ = 2 .

C Sparse donor weights

As described in Section 4.1, for each of the 2,000 simulated datasets, we construct SCs by sampling ten donors from the set of potentially valid donors (PVDs). The set of PVDs is a subset of the pool of 1,000 potential donors, and depends on the specific selection procedure used to identify valid donors (i.e. All, Valid, S1, or S2). As an alternative to sampling ten donors, we can construct SCs using the full set of PVDs, and employ regularization to select approximately ten donors to have non-zero weights based on the pre-treatment fit [e.g., 14,37]. With this sparse regularization approach, the total number of parameters is equal to the number of PVDs identified by the selection procedure. For the simulations described in Appendix B, SC models estimated using the full set of PVDs have 1,000 parameters in the All case, 200 parameters in the Valid case, 10 parameters in the S1 case, and approximately 200 parameters in the S2 case (but the exact number varies across datasets).

To enforce sparsity in the donor weights, we set the prior to be a discrete mixture of normal distributions [43]

(A4) β i η N ( 0 , σ 1 ) + ( 1 η ) N ( 0 , σ 2 ) , 0 < η < 1 , σ 1 σ 2

The parameter η controls how weights tend to cluster close to zero via the narrow distribution, and can be interpreted as the expected fraction of donors with effectively zero weight. This is similar to the spike and slab prior of [44]. However, replacing the exact zero values with a narrow distribution of “irrelevant” values facilitates straightforward estimation with standard probabilistic programming languages.

In Figure A2, we compare this sparse regularization approach to the sampling approach from the main text. For this data generating process, the expected bias is identical. The variance in the All case is slightly lower. This is because the number of donors with non-zero weight typically ends up being a bit larger than 10, and so the fraction of invalid donors is more concentrated around 80%.

Figure A2 
                  Low noise case of Figure 2. The filled circles show the results discussed in the main text, where each SC model is estimated using ten donors sampled from the set of PVDs. The open circles show the alternative sparse regularization approach, where each SC model is estimated using the full set of PVDs, and we employ sparsity-inducing priors (equation (A4)) to control the number of non-zero weights. The expected bias is identical for both approaches. The variance in the All case is slightly lower. This is because the number of donors with non-zero weight typically ends up being a bit larger than 10, and so the fraction of invalid donors is more concentrated around 80%.
Figure A2

Low noise case of Figure 2. The filled circles show the results discussed in the main text, where each SC model is estimated using ten donors sampled from the set of PVDs. The open circles show the alternative sparse regularization approach, where each SC model is estimated using the full set of PVDs, and we employ sparsity-inducing priors (equation (A4)) to control the number of non-zero weights. The expected bias is identical for both approaches. The variance in the All case is slightly lower. This is because the number of donors with non-zero weight typically ends up being a bit larger than 10, and so the fraction of invalid donors is more concentrated around 80%.

References

[1] Lee CM, Spekkens RW. Causal inference via algebraic geometry: feasibility tests for functional causal structures with two binary observed variables. J Causal Inference. 2017;5(2):20160013. 10.1515/jci-2016-0013Search in Google Scholar

[2] Gilligan-Lee C. Causing trouble. New Sci. 2020;246(3279):32–5. 10.1016/S0262-4079(20)30817-4Search in Google Scholar

[3] Richens JG, Lee CM, Johri S. Improving the accuracy of medical diagnosis with causal machine learning. Nat Commun. 2020;11(1):1–9. 10.1038/s41467-020-17419-7Search in Google Scholar PubMed PubMed Central

[4] Dhir A, Lee CM. Integrating overlapping datasets using bivariate causal discovery. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34; 2020. p. 3781–90. 10.1609/aaai.v34i04.5789Search in Google Scholar

[5] Perov Y, Graham L, Gourgoulias K, Richens J, Lee C, Baker A, et al. Multiverse: causal reasoning using importance sampling in probabilistic programming. In: Symposium on advances in approximate bayesian inference. PMLR; 2020. p. 1–36. Search in Google Scholar

[6] Vlontzos A, Kainz B, Gilligan-Lee CM. Estimating the probabilities of causation via deep monotonic twin networks. 2021. arXiv: http://arXiv.org/abs/arXiv:210901904. Search in Google Scholar

[7] Gilligan-Lee CM, Hart C, Richens J, Johri S. Leveraging directed causal discovery to detect latent common causes in cause-effect Pairs. IEEE Transactions on Neural Networks and Learning Systems. 2022. Search in Google Scholar

[8] Jeunen O, Gilligan-Lee CM, Mehrotra R, Lalmas M. Disentangling causal effects from sets of interventions in the presence of unobserved confounders. 2022. arXiv:221005446. Search in Google Scholar

[9] Reynaud H, Vlontzos A, Dombrowski M, Lee C, Beqiri A, Leeson P, et al. DaARTAGNAN: counterfactual video generation. 2022. arXiv: 220601651. 10.1007/978-3-031-16452-1_57Search in Google Scholar

[10] Van Goffrier G, Maystre L, Gilligan-Lee C. Estimating long-term causal effects from short-term experiments and long-term observational data with unobserved confounding. 2023. arXiv: 230210625. Search in Google Scholar

[11] Abadie A, Gardeazabal J. The economic costs of conflict: A case study of the Basque Country. Am Econ Rev. 2003;93(1):113–32. 10.1257/000282803321455188Search in Google Scholar

[12] Abadie A, Diamond A, Hainmueller J. Synthetic control methods for comparative case studies: Estimating the effect of California - tobacco control program. J Am Stat Assoc. 2010;105(490):493–505. 10.1198/jasa.2009.ap08746Search in Google Scholar

[13] Abadie A, Diamond A, Hainmueller J. Comparative politics and the synthetic control method. Am J Polit Sci. 2015;59(2):495–510. 10.1111/ajps.12116Search in Google Scholar

[14] Brodersen KH, Gallusser F, Koehler J, Remy N, Scott SL, et al. Inferring causal impact using Bayesian structural time-series models. An Appl Stat. 2015;9(1):247–74. 10.1214/14-AOAS788Search in Google Scholar

[15] Kreif N, Grieve R, Hangartner D, Turner AJ, Nikolova S, Sutton M. Examination of the synthetic control method for evaluating health policies with multiple treated units. Health Econom. 2016;25(12):1514–28. 10.1002/hec.3258Search in Google Scholar PubMed PubMed Central

[16] Athey S, Imbens GW. The state of applied econometrics: Causality and policy evaluation. J Econ Perspectives. 2017;31(2):3–32. 10.1257/jep.31.2.3Search in Google Scholar

[17] Lin S, Xu M, Zhang X, Chao SK, Huang YK, Shi X. Balancing approach for causal inference at scale. 2023. arXiv: 230205549. 10.1145/3580305.3599778Search in Google Scholar

[18] Zeitler J, Vlontzos A, Gilligan-Lee CM. Non-parametric identifiability and sensitivity analysis of synthetic control models. In: 2nd Conference on Causal Learning and Reasoning; 2023. Search in Google Scholar

[19] Miao W, Shi X, Tchetgen ET. A confounding bridge approach for double negative control inference on causal effects. 2020. arXiv: 180804945. Search in Google Scholar

[20] Shi X, Li K, Miao W, Hu M, Tchetgen ET. Theory for identification and inference with synthetic controls: a proximal causal inference framework. 2023. arXiv: 210813935. Search in Google Scholar

[21] Shi C, Sridhar D, Misra V, Blei D. On the assumptions of synthetic control methods. In: International Conference on Artificial Intelligence and Statistics. PMLR; 2022. p. 7163–75. Search in Google Scholar

[22] Kuroki M, Pearl J. Measurement bias and effect restoration in causal inference. Biometrika. 2014;101(2):423–37. 10.1093/biomet/ast066Search in Google Scholar

[23] Miao W, Geng Z, Tchetgen Tchetgen EJ. Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika. 2018;105(4):987–93. 10.1093/biomet/asy038Search in Google Scholar PubMed PubMed Central

[24] Tchetgen EJT, Ying A, Cui Y, Shi X, Miao W. An introduction to proximal causal learning. 2020. arXiv: 200910982. 10.1101/2020.09.21.20198762Search in Google Scholar

[25] Imbens G, Kallus N, Mao X, Wang Y. Long-term causal inference under persistent confounding via data combination. 2022. arXiv: 220207234. Search in Google Scholar

[26] Shpitser I, Wood-Doughty Z, Tchetgen EJT. The proximal id algorithm. 2021. arXiv: 210806818. Search in Google Scholar

[27] Liu J, Tchetgen EJT, Varjo C. Proximal causal inference for synthetic control with surrogates. 2023. arXiv: 202309527. Search in Google Scholar

[28] Rosenbaum PR, Rubin DB. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc Ser B (Methodological). 1983;45(2):212–8. 10.1111/j.2517-6161.1983.tb01242.xSearch in Google Scholar

[29] Imbens GW. Sensitivity to exogeneity assumptions in program evaluation. Am Econ Rev. 2003;93(2):126–32. 10.1257/000282803321946921Search in Google Scholar

[30] Veitch V, Zaveri A. Sense and sensitivity analysis: Simple post-hoc analysis of bias due to unobserved confounding. Adv Neural Inform Proces Syst. 2020;33:10999–1009. Search in Google Scholar

[31] Cinelli C, Hazlett C. Making sense of sensitivity: Extending omitted variable bias. J R Stat Soc Ser B (Stat Meth). 2020;82(1):39–67. 10.1111/rssb.12348Search in Google Scholar

[32] Cinelli C, Kumor D, Chen B, Pearl J, Bareinboim E. Sensitivity analysis of linear structural causal models. In: International conference on machine learning. PMLR; 2019. p. 1252–61. Search in Google Scholar

[33] Nazaret A, Shi C, Blei DM. On the misspecification of linear assumptions in synthetic control. 2023. arXiv: 230212777. Search in Google Scholar

[34] Shi X, Miao W, Tchetgen ET. A selective review of negative control methods in epidemiology. 2022. arXiv: 200905641. Search in Google Scholar

[35] Pearl J. Causality. Cambridge: University Printing House; 2009. 10.1017/CBO9780511803161Search in Google Scholar

[36] Zhang J, Bareinboim E. Can humans be out of the loop? In: Conference on Causal Learning and Reasoning. PMLR; 2022. p. 1010–25. Search in Google Scholar

[37] Ben-Michael E, Feller A, Rothstein J. The augmented synthetic control method. J Am Stat Assoc. 2021;116(536):1789–803. 10.1080/01621459.2021.1929245Search in Google Scholar

[38] Chernozhukov V, Wüthrich K, Zhu Y. An exact and robust conformal inference method for counterfactual and synthetic controls. J Am Stat Assoc. 2021;116(536):1849–64. 10.1080/01621459.2021.1920957Search in Google Scholar

[39] Gelman A, Hill J. Data analysis using regression and multilevel/hierarchical models. Analytical Methods for Social Research. Cambridge: Cambridge University Press; 2006. 10.1017/CBO9780511790942Search in Google Scholar

[40] Yekutieli D. Adjusted Bayesian inference for selected parameters. J R Stat Soc Ser B Stat Methodol. 2012 02;74(3):515–541. https://doi.org/10.1111/j.1467-9868.2011.01016.x. Search in Google Scholar

[41] Rasines DG, Young GA. Bayesian selective inference: non-informative priors. 2020. arXiv: 200804584. Search in Google Scholar

[42] Kuchibhotla AK, Kolassa JE, Kuffner TA. Post-selection inference. An Rev Stat Appl. 2022;9:505–27. https://www.annualreviews.org/content/journals/10.1146/annurev-statistics-100421-044639. 10.1146/annurev-statistics-100421-044639Search in Google Scholar

[43] Betancourt M. Modelling sparsity; 2021. https://betanalpha.github.io/assets/case_studies/modeling_sparsity.html. Search in Google Scholar

[44] Mitchell TJ, Beauchamp JJ. Bayesian variable selection in linear regression. J Am Stat Assoc. 1988;83(404):1023–32. 10.1080/01621459.1988.10478694Search in Google Scholar

Received: 2024-07-05
Revised: 2024-11-14
Accepted: 2025-06-15
Published Online: 2025-10-08

© 2025 the author(s), published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Research Articles
  2. Decision making, symmetry and structure: Justifying causal interventions
  3. Targeted maximum likelihood based estimation for longitudinal mediation analysis
  4. Optimal precision of coarse structural nested mean models to estimate the effect of initiating ART in early and acute HIV infection
  5. Targeting mediating mechanisms of social disparities with an interventional effects framework, applied to the gender pay gap in Western Germany
  6. Role of placebo samples in observational studies
  7. Combining observational and experimental data for causal inference considering data privacy
  8. Recovery and inference of causal effects with sequential adjustment for confounding and attrition
  9. Conservative inference for counterfactuals
  10. Treatment effect estimation with observational network data using machine learning
  11. Causal structure learning in directed, possibly cyclic, graphical models
  12. Mediated probabilities of causation
  13. Beyond conditional averages: Estimating the individual causal effect distribution
  14. Matching estimators of causal effects in clustered observational studies
  15. Ancestor regression in structural vector autoregressive models
  16. Single proxy synthetic control
  17. Bounds on the fixed effects estimand in the presence of heterogeneous assignment propensities
  18. Minimax rates and adaptivity in combining experimental and observational data
  19. Highly adaptive Lasso for estimation of heterogeneous treatment effects and treatment recommendation
  20. A clarification on the links between potential outcomes and do-interventions
  21. Valid causal inference with unobserved confounding in high-dimensional settings
  22. Spillover detection for donor selection in synthetic control models
  23. Review Article
  24. The necessity of construct and external validity for deductive causal inference
Downloaded on 16.10.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jci-2024-0036/html
Scroll to top button