Abstract
Synthetic control methods are widely used to estimate the treatment effect on a single treated unit in time-series settings. A common approach to estimate synthetic control weights is to regress the treated unit’s pretreatment outcome and covariates’ time series measurements on those of untreated units via ordinary least squares. However, this approach can perform poorly if the pretreatment fit is not near perfect, whether the weights are normalized. In this article, we introduce a single proxy synthetic control approach, which views the outcomes of untreated units as proxies of the treatment-free potential outcome of the treated unit, a perspective we leverage to construct a valid synthetic control. Under this framework, we establish an alternative identification strategy and corresponding estimation methods for synthetic controls and the treatment effect on the treated unit. Notably, unlike existing proximal synthetic control methods, which require two types of proxies for identification, ours relies on a single type of proxy, thus facilitating its practical relevance. ∣In addition, we adapt a conformal inference approach to perform inference about the treatment effect, obviating the need for a large number of posttreatment observations. Finally, our framework can accommodate time-varying covariates and nonlinear models. We demonstrate the proposed approach in a simulation study and a real-world application.
1 Introduction
Synthetic control methods have grown popular for estimating the treatment effect of an intervention in settings where a single unit is treated and pre- and posttreatment time series data are available on the treated unit and a heterogeneous pool of untreated control units [1,2]. In the absence of a natural control unit, the main idea of the approach hinges upon constructing a so-called synthetic control, corresponding to a certain weighted average of control units’ outcomes (and potentially covariates), obtained by matching the outcome time series of the treated unit to the weighted average in the preintervention period, to the extent empirically feasible. The resulting synthetic control is then used to forecast the treatment-free potential outcome of the treated unit in the posttreatment period, therefore delivering an estimate of the treatment effect by comparing the treated unit’s outcome to the synthetic control forecast.
There is a fast-growing literature concerned with developing and improving approaches to constructing synthetic control weights. Following Abadie et al. [2], a common approach is to use ordinary (or weighted) least squares by regressing the pretreatment outcome and available covariates of the treated unit on those of control units, typically restricting the weights to be nonnegative and sum to one; see Section 2.2 for a more detailed discussion. Despite intuitive appeal and simplicity, the performance of the standard synthetic control approach may break down in settings where the pretreatment synthetic control match to the treated unit’s outcomes is short of perfect; an eventuality Abadie et al. [2] warns against. To improve the performance of the synthetic control approach in the event of an imperfect pretreatment match, recent papers have considered alternative formulations of the synthetic control framework. For example, Xu [3], Amjad et al. [4], Ben-Michael et al. [5], Ferman and Pinto [6], Ferman [7], and Shi et al. [8] rely on variants of the so-called interactive fixed effects model (IFEM; Bai [9]). In particular, the latter three articles specify a linear latent factor potential outcome model with an exogenous, common set of latent factors with corresponding unit-specific factor loadings. Under this linear factor model, a key identification condition is that the factor loading of the treated unit lies in the vector space spanned by factor loadings of donor units, and thus, there exists a linear combination of the latter that matches the former exactly. Using the corresponding matching weights, one can therefore construct an unbiased synthetic control of the treated unit’s potential outcome that, under certain conditions, can be used to mimic the treated unit’s outcome in the posttreatment period, had the intervention been withheld. At their core, these methods substitute the requirement of a perfect pretreatment match of the outcome of the treated unit and the synthetic control (an empirically testable assumption) with finding a match for the treated unit’s factor loadings in the linear span of the donors’ factor loadings (an empirically untestable assumption). Despite the growing interest in synthetic control methods, limited research has gone beyond the IFEM or its nonparametric generalizations [8,10]; one notable exception is Shi et al. [11] where the units’ outcomes are viewed as averages of more granular study units, allowing for the construction of a synthetic control under specific restrictions on the model of granular study units’ outcomes.
In this work, we consider an alternative theoretical framework to formalize the synthetic control approach which obviates a specification of an IFEM. Specifically, we propose to view the synthetic control model from a measurement error perspective, whereby donor units’ outcomes stand as error-prone proxy measurements of the treated unit’s treatment-free potential outcome. In this framework, a synthetic control outcome can be obtained via a simple form of calibration, say a linear combination of donor units, so that on average, it matches the treated unit’s outcome in the pretreatment period. Although the standard IFEM views the treated and control units’ outcomes as proxies of latent factors, our approach views donor units’ outcomes as direct proxies of the treated unit’s treatment-free potential outcome. Thus, the proposed framework shares similarity with the recent proximal synthetic control framework of Shi et al. [8], which also formalizes donor outcomes as the so-called outcome proxies. However, a major distinction is that the latter requires an additional group of proxies (so-called treatment proxies) to identify synthetic control weights; in contrast, our proposed approach relies on a single type of proxies, given by donor units and obviates the need to evoke existence of latent factors.
Interestingly, similar to the connection between the proximal synthetic control approach of Shi et al. [8] and proximal causal inference for independent and identically distributed (i.i.d.) data [12,13], the proposed synthetic control framework is likewise inspired by the control outcome calibration approach [14] and its recent generalization to the so-called single proxy control framework [15] both of which were proposed for i.i.d. samples subject to an endogenous treatment assignment mechanism. Therefore, we aptly refer to our approach as a single proxy synthetic control (SPSC) approach. Despite this connection, the synthetic control generalization presents several new challenges related to (i) only observing a single treated unit, and therefore, treatment assignment is implicitly conditioned on, and (ii) having access to pre- and posttreatment time series data for a heterogeneous pool of untreated donor units, none of which can serve as a natural control; and (iii) serial correlation and heteroskedasticity due to the time series nature of the data. We tackle each of challenges (i)–(iii) in turn and develop a general framework for single proxy control in a synthetic control setting. The proposed method is implemented in an R package available at https://github.com/qkrcks0218/SPSC.
2 Setup and review of existing synthetic control frameworks
2.1 Setup
Let us consider a setting where
For illustrative purposes, we will consider the following two examples throughout:
Example 1
Abadie et al. [2] investigated the effects of Proposition 99, a tobacco control program implemented in California in 1988, on cigarette sales in the state. Their empirical analysis considered annual cigarette sales data from California and from
Example 2
In Section 5, we revisited the analysis by Fohlin and Lu [16] to study the effects of the Panic of 1907 [17] on the average log stock prices of two trust companies (Knickerbocker and Trust Company of America) that were hypothesized to have been impacted by the Panic. For comparison, a selection of
Throughout, let
2.2 Review of existing synthetic control framework
A common target estimand in the synthetic control setting is the average treatment effect on the treated unit (ATT) at time
Note that, by definition,
Assumption 2.1
(Consistency)
In addition, we assume no interference, i.e., the treatment has no causal effect on control units.
Assumption 2.2
(No interference on control units)
In the context of Example 1, Assumption 2.2 means that Proposition 99 does not have a causal effect on other states’ cigarette sales; a similar interpretation applies to Example 2.
Under Assumptions 2.1 and 2.2, we have the following result almost surely for
Therefore, for the posttreatment period,
In the classical synthetic control setting, a further assumption relates the observed outcomes of the untreated units with the treatment-free potential outcome of the treated unit. Specifically, following Abadie et al. [2] and Ferman and Pinto [6], suppose that units’ outcomes are generated from the following IFEM [9] for
Here,
Next, following Ferman and Pinto [6] and Shi et al. [8], suppose that a set of weights
Equations (1) and (2) imply that there exists a synthetic control
In the context of Example 1, equation (3) means that:
A similar interpretation holds for Example 2. Therefore,
Based on (3), one may consider estimating
where
However, as discussed by Ferman and Pinto [6] and Shi et al. [8], the OLS weights obtained from (5) are generally inconsistent under (2) as
In the context of Example 1, equation (6) means that:
Example 2 follows a similar interpretation. Note that (6) is distinct from condition (2) of Ferman and Pinto [6] and Shi et al. [8], as reflected in their interpretations (4) and (7). Moreover, as discussed in Ferman and Pinto [6], (6) can be expected to hold approximately under (2) when the variance of the error
Recently, Shi et al. [8] introduced a proximal causal inference framework for synthetic controls. Specifically, they assume that they have also observed proxy variables
A reasonable candidate for
3 SPSC approach
3.1 Assumptions
In this section, we provide a novel synthetic control approach which obviates the need for an IFEM, and, in fact, does not necessarily postulate the existence of a latent factor
Assumption 3.1
(Proxy) There exists a function
Assumption 3.1 encodes that a function of the untreated units’ outcomes
Assumption 3.2
(Existence of a synthetic control bridge function) For all
Assumption 3.2 is the key identification assumption of the SPSC framework. It posits the existence of a synthetic control
In particular, if
Regression model (10) essentially implies that
Assumption 3.2 plays an analogous role as condition (2) in Ferman and Pinto [6] and Shi et al. [8] and condition (6) in Abadie et al. [2] in that it establishes a relationship between
Unlike condition (2), Assumption 3.2 obviates the need for latent factors, their corresponding factor loadings, IFEM (1), or any related latent factor models. Instead, Assumption 3.2 simply states that it is possible to construct a function of the control units’ outcomes
Moreover, this perspective aligns with existing statistical literature. In particular, model (10) is reminiscent of a nonclassical measurement model [22,23]. From a regression model perspective, the donors’ outcomes
To summarize, the SPSC framework differs from existing synthetic control frameworks in its identifying assumptions and interpretation of the synthetic control. Specifically, in the SPSC framework, the synthetic control is viewed as an error-prone outcome measurement (see (10)), eliminating the need for a generative model for
3.2 A generative model
While, in principle, Assumptions 3.1 and 3.2 do not require a generative model, it is instructive to consider a model compatible with these assumptions. In this vein, suppose that
Here,

Graphical illustrations for (a) Assumption 3.1, (b) model (12) with correlated errors, and (c) model (12) with independent errors. The dashed bow arcs depict the association between two variables. For illustration, we consider
Under model (12),
Condition 3.1
For all
Condition 3.1 is a sufficient condition for Assumption 3.2 because, under Condition 3.1, we obtain
Under model (12) and Condition 3.1, consider the special case where
Here,
A sufficient condition for the existence of the weight
Since equation (14) is based on the IFEM, it has interesting connections with previous works that also rely on this model. In order to elucidate these connections, we consider the following alternative representation of equation (14):
where
While the IFEM with correlated errors in (13) is useful for motivating the SPSC framework, the standard IFEM typically assumes no correlation among errors, i.e.,
3.3 Identification of the synthetic control and the treatment effect
As a direct consequence of Assumptions 2.1, 2.2, 3.1, and 3.2, the synthetic control bridge function
Theorem 3.1
Under Assumptions
2.1, 2.2, 3.1, and
3.2, the synthetic control bridge function
The proof of the theorem, as well as all other proofs, are provided in Supplementary Material S3. Theorem 3.1 motivates our approach for estimating the synthetic control bridge function
Theorem 3.2
Under Assumptions
2.1, 2.2, 3.1, and
3.2, we have that
Theorem 3.2 provides a theoretical basis for the use of the synthetic control method to estimate the ATT. Specifically, following Abadie and Gardeazabal [1] and Shi et al. [8], we use
To facilitate the exposition, hereafter in the main text, we restrict attention to inference under a linear bridge function, i.e.,
3.4 Estimation and inference of the treatment effect under a linear bridge function
We first discuss estimation of the synthetic control weights
Here,
It is instructive to note that solving the estimating equation
One might attempt to interpret the model on the right-hand side as a standard regression model, treating
The choice of
A time-invariant specification of
Here,
The function
The time-varying estimating function
An estimator of
Here,
Equations (18) and (19) fortunately admit closed-form solutions. For instance, if
Here,
A special instance of
Once the synthetic control weights are estimated, one could in principle estimate the treatment-free potential outcome and the ATT as
We posit a parsimonious working model for the ATT as a function of time. Specifically, we assume that the ATT follows a model indexed by a
Example 3
(Constant effect)
Example 4
(Linear effect)
Example 5
(Nonlinear effect) This includes a quadratic model
For tractable inference, we assume that the error process is weakly independent, which is formally stated as follows:
Assumption 3.3
(Weakly dependent error) Let
Assumption 3.3 applies to many standard time series models, including autoregressive models, moving-average models, and autoregressive moving-average models.
Along with these conditions, we will consider an asymptotic regime where
Then, GMM estimators of the synthetic control weights and treatment effect parameter are obtained as the solution to the following minimization problem:
where
Under our assumptions, the following result establishes that
Theorem 3.3
Suppose that Assumptions
2.1, 2.2, 3.1, 3.2, 3.3, and regularity conditions in Supplementary Material S3.2 hold. Then, as
where
Here,
Note that
For
Finally, while Theorem 3.3 specifies the required rate for the regularization parameter
3.5 Conformal inference of the treatment effect
Key limitations of the methodology proposed in the previous Section include (i) a parsimonious model choice for
Consider an asymptotic regime whereby
At the minimum-norm synthetic control weights
where
In words, the p-value is the proportion of residuals of magnitudes no smaller than the posttreatment residual. Under
4 Simulation
We conducted a simulation study to evaluate the finite sample performance of the proposed estimator under ary variety of conditions. On the basis of the IFEM in (1), we considered the following data generating mechanisms with pre- and posttreatment periods of length
First, for each
The latent factor loadings
The latent factor loading
Note that condition (2) is satisfied with
The errors
Under (Correlated errors) and (No Y error), equation (14) admit solutions, thus satisfying Condition 3.1 and Assumption 3.2. In contrast, under (Independent errors), equation (14) does not have a solution, violating of Condition 3.1. Nonetheless, as we discussed in Section 3.2, it is still possible to find a synthetic control bridge function satisfying Assumption 3.2; see Supplementary Material S1.5 for details.
With these generated variables,
By using the simulated data, we estimated the aggregated ATT over the posttreatment periods, i.e.,
To simplify the discussion, we present only the results under the (Non-simplex) case for

Empirical Distributions of the Estimators. The top and bottom plots show results when
Table 1 provides more detailed summary statistics when errors
Summary statistics of estimation results under independent errors
|
Statistics | Estimators and
|
|||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
OLS-NoReg | OLS-Standard | ASC | SCPI | SPSC-NoDT | SPSC-DT | ||||||||
100 | 500 | 100 | 500 | 100 | 500 | 100 | 500 | 100 | 500 | 100 | 500 | ||
No trend | Bias
|
0.02 |
|
0.04 | 0.00 | 0.03 |
|
0.03 |
|
0.03 |
|
0.03 |
|
ASE
|
0.89 | 0.37 | — | — | — | — | — | — | 0.84 | 0.37 | 0.84 | 0.37 | |
BSE
|
0.97 | 0.38 | — | — | — | — | — | — | 0.84 | 0.37 | 0.84 | 0.37 | |
ESE
|
0.92 | 0.37 | 1.25 | 0.51 | 0.90 | 0.39 | 0.92 | 0.39 | 0.89 | 0.37 | 0.89 | 0.37 | |
MSE
|
0.85 | 0.14 | 1.56 | 0.26 | 0.82 | 0.15 | 0.84 | 0.15 | 0.79 | 0.14 | 0.79 | 0.14 | |
Coverage (ASE) | 0.95 | 0.96 | — | — | — | — | — | — | 0.93 | 0.96 | 0.93 | 0.96 | |
Coverage (BSE) | 0.96 | 0.96 | — | — | — | — | — | — | 0.93 | 0.96 | 0.93 | 0.96 | |
Linear trend | Bias
|
1.30 | 1.37 | 10.70 | 10.51 | 8.26 | 5.70 | 11.93 | 11.85 |
|
|
0.07 | 0.08 |
ASE
|
1.76 | 0.79 | — | — | — | — | — | — | 1.85 | 0.84 | 1.79 | 0.81 | |
BSE
|
2.02 | 0.82 | — | — | — | — | — | — | 2.24 | 0.99 | 1.94 | 0.86 | |
ESE
|
1.82 | 0.77 | 1.16 | 0.41 | 1.70 | 0.95 | 0.96 | 0.43 | 2.08 | 0.97 | 1.94 | 0.85 | |
MSE
|
5.01 | 2.45 | 115.72 | 110.54 | 71.16 | 33.37 | 143.18 | 140.58 | 6.36 | 3.54 | 3.75 | 0.72 | |
Coverage (ASE) | 0.87 | 0.58 | — | — | — | — | — | — | 0.84 | 0.49 | 0.93 | 0.94 | |
Coverage (BSE) | 0.93 | 0.61 | — | — | — | — | — | — | 0.91 | 0.60 | 0.94 | 0.96 |
Bias row gives the empirical bias of 500 estimates. ASE row gives the asymptotic standard error obtained from the sandwich estimator of the GMM. BSE row shows the bootstrap standard error obtained from the approach in Supplementary Material S1.4. ESE row gives the standard deviation of 500 estimates. MSE row gives the mean squared error of 500 estimates. Coverage (ASE) and Coverage (BSE) rows give the empirical coverage rate of 95% confidence intervals based on the ASE and BSE, respectively. Bias, standard errors, and mean squared error are scaled by factors of 10, 10, and 100, respectively.
Next, we evaluated the finite sample performance of the conformal inference approach in Section 3.5. As competing methods, we considered the ASC, SCPI, and two SPSC estimators. For each simulated dataset, we obtained pointwise 95% pointwise prediction intervals for the treatment effect
Table 2 presents the empirical coverage rates for each simulated scenario. Surprisingly, the ASC and SPCI approaches fail to attain the nominal coverage rate; we believe this failure originates from the simulation setting where
Empirical coverage rates of 95% pointwise prediction intervals
|
|
Estimators and
|
|||||||
---|---|---|---|---|---|---|---|---|---|
ASC | SCPI | SPSC-NoDT | SPSC-DT | ||||||
100 | 500 | 100 | 500 | 100 | 500 | 100 | 500 | ||
No trend | Independent errors | 0.933 | 0.925 | 0.935 | 0.912 | 0.959 | 0.948 | 0.961 | 0.948 |
Correlated errors | 0.904 | 0.906 | 0.925 | 0.913 | 0.962 | 0.952 | 0.963 | 0.953 | |
No
|
0.922 | 0.910 | 0.938 | 0.925 | 0.964 | 0.951 | 0.964 | 0.953 | |
Linear trend | Independent errors | 0.795 | 0.846 | 0.938 | 0.889 | 0.959 | 0.944 | 0.962 | 0.949 |
Correlated errors | 0.844 | 0.883 | 0.907 | 0.843 | 0.957 | 0.944 | 0.967 | 0.948 | |
No
|
0.728 | 0.808 | 0.920 | 0.852 | 0.957 | 0.953 | 0.957 | 0.946 |
The numbers in SPSC-NoDT and SPSC-DT columns give the empirical coverage rates of 95% pointwise prediction intervals obtained from the conformal inference approach in Section 3.5. The numbers in ASC and SCPI columns give the empirical coverage rates of 95% pointwise prediction intervals obtained from the approaches proposed by Ben-Michael et al. [5] and Cattaneo et al. [29], respectively.
In Supplementary Material S1.9, we assess the finite sample performance of the proposed conformal inference approach based on the simulation scenario given in the study by Cattaneo et al. [29], which may not be compatible with the key identifying condition, Assumption 3.2, of SPSC. As expected, the approach of Cattaneo et al. [29] performs well in this setting. Although our method without time trend adjustment (i.e., SPSC-NoDT) sometimes fails to achieve the nominal coverage rate, particularly when outcomes are nonstationary, our method with time trend adjustment (i.e., SPSC-DT) consistently attains the nominal coverage rate, provided that the basis functions for time periods are appropriately chosen. This highlights the robustness of the proposed SPSC approach and its broad applicability in synthetic control settings.
5 Application
We applied the proposed method to analyze a real-world application. In particular, we revisited the dataset analyzed in Fohlin and Lu [16], which consists of time series data of length 384 for 59 trust companies, recorded between January 5, 1906, and December 30, 1908, with a triweekly frequency. Notably, this time period includes the Panic of 1907 [17], a financial panic that lasted for three weeks in the United States starting in mid-October, 1907. As a result of the panic, there was a significant drop in the stock market during this period. From this context, we focused on the effect of the financial panic in October 1907 on the log stock price of trust companies using
The treated unit and donors were defined as follows. According to Fohlin and Lu [16], Knickerbocker, Trust Company of America, and Lincoln were the three trust companies that were most severely affected during the panic. However, Lincoln’s stock price showed a strong downward trend over the pretreatment period. Therefore, we defined the average of the log stock prices of the first two trust companies as
We first report the ATT estimates under a constant treatment effect model
Summary statistics of the estimation of the average treatment effect on the treated
Estimator | OLS-NoReg | OLS-Standard | ASC | SCPI | SPSC-NoDT | SPSC-DT |
---|---|---|---|---|---|---|
Estimate |
|
|
|
|
|
|
ASE | 0.139 | — | — | — | 0.084 | 0.066 |
95% CI | (
|
— | — | — | (
|
(
|
We also constructed the pointwise prediction intervals based on the SPSC approach with time-varying components using the conformal inference approach in Section 3.5. For comparison, we also implemented the ASC and SCPI approaches. Figure 3 presents the visual summary of the result. For the posttreatment period
![Figure 3
Graphical Summaries of the 95% prediction intervals over the posttreatment periods. These plots, from left to right, present the results using the approaches proposed by Ben-Michael et al. [5], Cattaneo et al. [29], and the conformal inference approach presented in Section 3.5 with the time-varying estimating function
Ψ
pre
{\Psi }_{\text{pre}}
, respectively. The numbers show the average length of the 95% prediction intervals over the posttreatment periods.](/document/doi/10.1515/jci-2023-0079/asset/graphic/j_jci-2023-0079_fig_003.jpg)
Graphical Summaries of the 95% prediction intervals over the posttreatment periods. These plots, from left to right, present the results using the approaches proposed by Ben-Michael et al. [5], Cattaneo et al. [29], and the conformal inference approach presented in Section 3.5 with the time-varying estimating function
In addition, for the sake of credibility, we conducted the following additional analysis for the application; the details can be found in Supplementary Material S1.10. First, we studied the trend of the residuals, the difference between the observed outcome and synthetic control, over the pretreatment time periods. We observed that the OLS-NoReg, SCPI, and SPSC-DT estimators produced residuals without a deterministic trend over time, while the other three estimators (OLS-Standard, ASC, SPSC-NoDT) showed the opposite behavior. Notably, the SPSC-DT estimator appears to satisfy the zero mean condition of Assumption 3.2, whereas the SPSC-NoDT estimator seems to violate this condition due to a nonzero deterministic trend over time. This again highlights the importance of accommodating time-varying components in the SPSC estimation procedure.
Next, we performed the following falsification study. We restricted the entire analysis to the pretreatment period in which the causal effect is expected to be null. We artificially defined a financial panic time in late July 1907, which is roughly three months before the actual financial panic. This resulted in the lengths of the pre- and posttreatment periods equal to
6 Concluding remarks
In this article, we propose a novel SPSC approach in which the synthetic control is defined as a linear combination of donors’ outcomes whose conditional expectation matches the treatment-free potential outcome in both pre- and posttreatment periods. The model is analogous yet more general than measurement error models widely studied in standard measurement error literature. Under the framework, we establish the identification of a synthetic control and provide an estimation strategy for the ATT. Furthermore, we introduce a method for inferring the treatment effect through pointwise prediction intervals, which remains valid even in the case of a short posttreatment period. We validate our methods through simulation studies and provide an application analyzing a real-world financial dataset related to the 1907 Panic.
We reiterate that the SPSC framework differs from existing synthetic control methods in its identifying assumptions and interpretation. It views the synthetic control as an error-prone outcome measurement, without the need for specifying a generative model for the outcome, whereas existing approaches treat it as the projection of the outcome onto the donor’s outcome space or the outcome itself. Despite these differences, both frameworks construct synthetic controls by optimally weighting donor units (according to their identifying assumptions), which are then used for treatment effect estimation. In addition, like other synthetic control methods, the SPSC framework allows for time-varying confounders, as demonstrated in the generative models in Section 3.2.
While, as mentioned in Section 3.1, the SPSC framework may be viewed as a nonstandard form of instrumental variable approach, it is important to highlight key distinctions between the proposed SPSC approach and well-known instrumental variable approaches in dynamic panel data, such as in Anderson and Hsiao [34] and Arellano and Bond [35]. In dynamic panel data models, endogeneity arises across different time periods, with the typical assumption that there is no within-time period endogeneity. As a result, these models use lagged variables as instruments to address cross-time endogeneity. In contrast, in the SPSC framework, endogeneity occurs within each time period, without specific assumptions about cross-time endogeneity. Consequently, the instrumental variable approach in this context operates within a single time period. We remark that, like the SPSC framework, many synthetic control models are agnostic about cross-time dependence structure; for example, the cross-time dependent structure of
As briefly mentioned in Section 1, the proposed SPSC framework has a connection to the single proxy control framework [14,15] developed for i.i.d. data. In particular, Park et al. [15] proposed an approach that relies on the so-called outcome bridge function, which is a (potentially nonlinear) function of outcome proxies. An important property of the outcome bridge function is that it is conditionally unbiased for the treatment-free potential outcome. Therefore, the proposed SPSC approach can be viewed as an adaptation of the outcome bridge function-based single proxy control approach to the synthetic control setting, where the outcome bridge function is known a priori to be a linear function of donors’ outcomes. In Supplementary Material S2, we present a general SPSC framework, which is designed to accommodate nonparametric and nonlinear synthetic controls. Therefore, this framework obviates the overreliance on a linear specification of synthetic controls in the literature and establishes a more direct connection with the outcome bridge function-based single proxy approach presented in Park et al. [15]. Notably, the general SPSC framework addresses underdeveloped areas of the synthetic control literature by allowing for various types of outcomes, including continuous, binary, count, or a combination of these.
In addition to the outcome bridge function-based approach, Park et al. [15] introduced two other single proxy control approaches for i.i.d. sampling. One approach relies on propensity score weighting, eliminating the need for specifying an outcome bridge function. The second approach uses both the propensity score and the outcome bridge function and, more importantly, exhibits a doubly-robust property in that the treatment effect in view is identified if either propensity score or outcome bridge function, but not necessarily both, is correctly specified. Consequently, a promising direction for future research would be to develop new SPSC approaches by extending these single proxy methods to the synthetic control setting. Such new SPSC approaches can be viewed as complementing the doubly-robust proximal synthetic control approach [10]. However, such extensions pose significant challenges due to (i) a single treated unit with nonrandom treatment assignmen; (ii) multiple heterogeneous untreated donor units; and (iii) serial correlation and heteroskedasticity due to the time series nature of the data. In particular, nonrandom treatment assignment undermines the conventional notion of the propensity score, rendering it undefined. Approaches for addressing these challenges and developing corresponding statistical methods will be considered elsewhere.
Acknowledgment
The authors are grateful for the comments from the editorial board and three anonymous reviewers, which significantly improved the manuscript.
-
Funding information: ETT was supported by NIH grant R01AG065276.
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and consented to its submission to the journal, reviewed all the results, and approved the final version of the manuscript.
-
Conflict of interest: The authors state no conflict of interest.
-
Data availability statement: The proposed method is implemented in an R package available at https://github.com/qkrcks0218/SPSC. The dataset analyzed in Section 5 is included in the replication package of Fohlin and Lu [16] accessible at https://www.aeaweb.org/articles?id=10.1257/pandp.20211097.
References
[1] Abadie A, Gardeazabal J. The economic costs of conflict: A case study of the basque country. Am Econ Rev. 2003;93(1):113–32. 10.1257/000282803321455188Search in Google Scholar
[2] Abadie A, Diamond A, Hainmueller J. Synthetic control methods for comparative case studies: Estimating the effect of California’s Tobacco control program. J Am Stat Assoc. 2010;105(490):493–505. 10.1198/jasa.2009.ap08746Search in Google Scholar
[3] Xu Y. Generalized synthetic control method: causal inference with interactive fixed effects models. Polit Anal. 2017;25(1):57–76. 10.1017/pan.2016.2Search in Google Scholar
[4] Amjad M, Shah D, Shen D. Robust synthetic control. J Machine Learn Res. 2018;19(22):1–51. Search in Google Scholar
[5] Ben-Michael E, Feller A, Rothstein J. The augmented synthetic control method. J Am Stat Assoc. 2021;116(536):1789–803. 10.1080/01621459.2021.1929245Search in Google Scholar
[6] Ferman B, Pinto C. Synthetic controls with imperfect pretreatment fit. Quant Econ. 2021;12(4):1197–221. 10.3982/QE1596Search in Google Scholar
[7] Ferman B. On the properties of the synthetic control estimator with many periods and many controls. J Am Stat Assoc. 2021;116(536):1764–72. 10.1080/01621459.2021.1965613Search in Google Scholar
[8] Shi X, Li K, Miao W, Hu M, Tchetgen Tchetgen E. Theory for identification and inference with synthetic controls: a proximal causal inference framework. 2023. Preprint arXiv:210813935. Search in Google Scholar
[9] Bai J. Panel data models with interactive fixed effects. Econometrica. 2009;77(4):1229–79. 10.3982/ECTA6135Search in Google Scholar
[10] Qiu H, Shi X, Miao W, Dobriban E, Tchetgen Tchetgen E. Doubly robust proximal synthetic controls. Biometrics. 2024;80(2):ujae055. 10.1093/biomtc/ujae055Search in Google Scholar PubMed PubMed Central
[11] Shi C, Sridhar D, Misra V, Blei D. On the assumptions of synthetic control methods. In: Camps-Valls G, Ruiz FJR, Valera I, editors. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics. vol. 151 of Proceedings of Machine Learning Research. PMLR; 2022. p. 7163–75. Search in Google Scholar
[12] Miao W, Geng Z, Tchetgen Tchetgen EJ. Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika. 2018;105(4):987–93. 10.1093/biomet/asy038Search in Google Scholar PubMed PubMed Central
[13] Tchetgen Tchetgen EJ, Ying A, Cui Y, Shi X, Miao W. An introduction to proximal causal inference. Stat Sci. 2024;39(3):375–90. 10.1214/23-STS911Search in Google Scholar
[14] Tchetgen Tchetgen E. The control outcome calibration approach for causal inference with unobserved confounding. Am J Epidemiol. 2013;179(5):633–40. 10.1093/aje/kwt303Search in Google Scholar PubMed PubMed Central
[15] Park C, Richardson DB, Tchetgen Tchetgen EJ. Single proxy control. Biometrics. 2024;80(2):ujae027. 10.1093/biomtc/ujae027Search in Google Scholar PubMed PubMed Central
[16] Fohlin C, Lu Z. How contagious was the panic of 1907? New evidence from trust company stocks. AEA Papers and Proceedings. 2021. p. 111. 10.1257/pandp.20211097Search in Google Scholar
[17] Moen J, Tallman EW. The bank panic of 1907: The role of trust companies. J Econ History. 1992;52(3):611–30. 10.1017/S0022050700011414Search in Google Scholar
[18] Doudchenko N, Imbens GW. Balancing, regression, difference-in-differences and synthetic control methods: A synthesis. Technical Report. 2016. National Bureau of Economic Research. 10.3386/w22791Search in Google Scholar
[19] Robbins MW, Saunders J, Kilmer B. A framework for synthetic control methods with high-dimensional, micro-level data: evaluating a neighborhood-specific crime intervention. J Am Stat Assoc. 2017;112(517):109–26. 10.1080/01621459.2016.1213634Search in Google Scholar
[20] Hansen LP. Large sample properties of generalized method of moments estimators. Econometrica. 1982;50(4):1029–54. 10.2307/1912775Search in Google Scholar
[21] Cui Y, Pu H, Shi X, Miao W, Tchetgen Tchetgen E. Semiparametric proximal causal inference. J Am Stat Assoc. 2024;119(546):1348–59. 10.1080/01621459.2023.2191817Search in Google Scholar
[22] Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: a modern perspective. 2nd ed. Chapman and Hall/CRC; 2006. 10.1201/9781420010138Search in Google Scholar
[23] Freedman LS, Midthune D, Carroll RJ, Kipnis V. A comparison of regression calibration, moment reconstruction and imputation for adjusting for covariate measurement error in regression. Stat Med. 2008;27(25):5195–216. 10.1002/sim.3361Search in Google Scholar PubMed PubMed Central
[24] Pearl J. Causal diagrams for empirical research. Biometrika. 1995;82(4):669–88. 10.1093/biomet/82.4.669Search in Google Scholar
[25] Donald SG, Imbens GW, Newey WK. Choosing instrumental variables in conditional moment restriction models. J Econ. 2009;152(1):28–36. 10.1016/j.jeconom.2008.10.013Search in Google Scholar
[26] Chen X. Chapter 76: Large sample sieve estimation of semi-nonparametric models. vol. 6 of Handbook of Econometrics; 2007. p. 5549–632. 10.1016/S1573-4412(07)06076-XSearch in Google Scholar
[27] Newey WK, West KD. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica. 1987;55(3):703–8. 10.2307/1913610Search in Google Scholar
[28] Andrews DWK. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica. 1991;59(3):817–58. 10.2307/2938229Search in Google Scholar
[29] Cattaneo MD, Feng Y, Titiunik R. Prediction intervals for synthetic control methods. J Am Stat Assoc. 2021;116(536):1865–80. 10.1080/01621459.2021.1979561Search in Google Scholar PubMed PubMed Central
[30] Chernozhukov V, Wüthrich K, Zhu Y. An exact and robust conformal inference method for counterfactual and synthetic controls. J Am Stat Assoc. 2021;116(536):1849–64. 10.1080/01621459.2021.1920957Search in Google Scholar
[31] Abadie A, Diamond A, Hainmueller J. Synth: An R package for synthetic control methods in comparative case studies. J Stat Softw. 2011;42(13):1–17. https://www.jstatsoft.org/v42/i13/. 10.18637/jss.v042.i13Search in Google Scholar
[32] Ben-Michael E. Augsynth: The Augmented Synthetic Control Method; 2023. R package version 0.2.0. Search in Google Scholar
[33] Cattaneo M, Feng Y, Palomba F, Titiunik R. scpi: Prediction intervals for synthetic control methods with multiple treated units and staggered adoption; 2023. R package version 2.2.2. https://CRAN.R-project.org/package=scpi. 10.32614/CRAN.package.scpiSearch in Google Scholar
[34] Anderson TW, Hsiao C. Estimation of dynamic models with error components. J Am Stat Assoc. 1981;76(375):598–606. 10.1080/01621459.1981.10477691Search in Google Scholar
[35] Arellano M, Bond S. Some tests of specification for panel data: Monte carlo evidence and an application to employment equations. Rev Econ Stud. 1991;58(2):277–97. 10.2307/2297968Search in Google Scholar
[36] Ferman B, Pinto C. Synthetic controls with imperfect pretreatment fit. 2019. arXiv: 191108521.Search in Google Scholar
© 2025 the author(s), published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Research Articles
- Decision making, symmetry and structure: Justifying causal interventions
- Targeted maximum likelihood based estimation for longitudinal mediation analysis
- Optimal precision of coarse structural nested mean models to estimate the effect of initiating ART in early and acute HIV infection
- Targeting mediating mechanisms of social disparities with an interventional effects framework, applied to the gender pay gap in Western Germany
- Role of placebo samples in observational studies
- Combining observational and experimental data for causal inference considering data privacy
- Recovery and inference of causal effects with sequential adjustment for confounding and attrition
- Conservative inference for counterfactuals
- Treatment effect estimation with observational network data using machine learning
- Causal structure learning in directed, possibly cyclic, graphical models
- Mediated probabilities of causation
- Beyond conditional averages: Estimating the individual causal effect distribution
- Matching estimators of causal effects in clustered observational studies
- Ancestor regression in structural vector autoregressive models
- Single proxy synthetic control
- Bounds on the fixed effects estimand in the presence of heterogeneous assignment propensities
- Minimax rates and adaptivity in combining experimental and observational data
- Highly adaptive Lasso for estimation of heterogeneous treatment effects and treatment recommendation
- A clarification on the links between potential outcomes and do-interventions
- Review Article
- The necessity of construct and external validity for deductive causal inference
Articles in the same Issue
- Research Articles
- Decision making, symmetry and structure: Justifying causal interventions
- Targeted maximum likelihood based estimation for longitudinal mediation analysis
- Optimal precision of coarse structural nested mean models to estimate the effect of initiating ART in early and acute HIV infection
- Targeting mediating mechanisms of social disparities with an interventional effects framework, applied to the gender pay gap in Western Germany
- Role of placebo samples in observational studies
- Combining observational and experimental data for causal inference considering data privacy
- Recovery and inference of causal effects with sequential adjustment for confounding and attrition
- Conservative inference for counterfactuals
- Treatment effect estimation with observational network data using machine learning
- Causal structure learning in directed, possibly cyclic, graphical models
- Mediated probabilities of causation
- Beyond conditional averages: Estimating the individual causal effect distribution
- Matching estimators of causal effects in clustered observational studies
- Ancestor regression in structural vector autoregressive models
- Single proxy synthetic control
- Bounds on the fixed effects estimand in the presence of heterogeneous assignment propensities
- Minimax rates and adaptivity in combining experimental and observational data
- Highly adaptive Lasso for estimation of heterogeneous treatment effects and treatment recommendation
- A clarification on the links between potential outcomes and do-interventions
- Review Article
- The necessity of construct and external validity for deductive causal inference