Single proxy synthetic control

Chan Park; Eric J. Tchetgen Tchetgen

doi:10.1515/jci-2023-0079

Article Open Access

Single proxy synthetic control

Chan Park and Eric J. Tchetgen Tchetgen

Published/Copyright: July 5, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Causal Inference Volume 13 Issue 1

Abstract

Synthetic control methods are widely used to estimate the treatment effect on a single treated unit in time-series settings. A common approach to estimate synthetic control weights is to regress the treated unit’s pretreatment outcome and covariates’ time series measurements on those of untreated units via ordinary least squares. However, this approach can perform poorly if the pretreatment fit is not near perfect, whether the weights are normalized. In this article, we introduce a single proxy synthetic control approach, which views the outcomes of untreated units as proxies of the treatment-free potential outcome of the treated unit, a perspective we leverage to construct a valid synthetic control. Under this framework, we establish an alternative identification strategy and corresponding estimation methods for synthetic controls and the treatment effect on the treated unit. Notably, unlike existing proximal synthetic control methods, which require two types of proxies for identification, ours relies on a single type of proxy, thus facilitating its practical relevance. ∣In addition, we adapt a conformal inference approach to perform inference about the treatment effect, obviating the need for a large number of posttreatment observations. Finally, our framework can accommodate time-varying covariates and nonlinear models. We demonstrate the proposed approach in a simulation study and a real-world application.

Keywords: average treatment effect on the treated; conformal inference; generalized method of moments; prediction interval; synthetic control

MSC 2010: 62D20; 62M10; 62P20

1 Introduction

Synthetic control methods have grown popular for estimating the treatment effect of an intervention in settings where a single unit is treated and pre- and posttreatment time series data are available on the treated unit and a heterogeneous pool of untreated control units [1,2]. In the absence of a natural control unit, the main idea of the approach hinges upon constructing a so-called synthetic control, corresponding to a certain weighted average of control units’ outcomes (and potentially covariates), obtained by matching the outcome time series of the treated unit to the weighted average in the preintervention period, to the extent empirically feasible. The resulting synthetic control is then used to forecast the treatment-free potential outcome of the treated unit in the posttreatment period, therefore delivering an estimate of the treatment effect by comparing the treated unit’s outcome to the synthetic control forecast.

There is a fast-growing literature concerned with developing and improving approaches to constructing synthetic control weights. Following Abadie et al. [2], a common approach is to use ordinary (or weighted) least squares by regressing the pretreatment outcome and available covariates of the treated unit on those of control units, typically restricting the weights to be nonnegative and sum to one; see Section 2.2 for a more detailed discussion. Despite intuitive appeal and simplicity, the performance of the standard synthetic control approach may break down in settings where the pretreatment synthetic control match to the treated unit’s outcomes is short of perfect; an eventuality Abadie et al. [2] warns against. To improve the performance of the synthetic control approach in the event of an imperfect pretreatment match, recent papers have considered alternative formulations of the synthetic control framework. For example, Xu [3], Amjad et al. [4], Ben-Michael et al. [5], Ferman and Pinto [6], Ferman [7], and Shi et al. [8] rely on variants of the so-called interactive fixed effects model (IFEM; Bai [9]). In particular, the latter three articles specify a linear latent factor potential outcome model with an exogenous, common set of latent factors with corresponding unit-specific factor loadings. Under this linear factor model, a key identification condition is that the factor loading of the treated unit lies in the vector space spanned by factor loadings of donor units, and thus, there exists a linear combination of the latter that matches the former exactly. Using the corresponding matching weights, one can therefore construct an unbiased synthetic control of the treated unit’s potential outcome that, under certain conditions, can be used to mimic the treated unit’s outcome in the posttreatment period, had the intervention been withheld. At their core, these methods substitute the requirement of a perfect pretreatment match of the outcome of the treated unit and the synthetic control (an empirically testable assumption) with finding a match for the treated unit’s factor loadings in the linear span of the donors’ factor loadings (an empirically untestable assumption). Despite the growing interest in synthetic control methods, limited research has gone beyond the IFEM or its nonparametric generalizations [8,10]; one notable exception is Shi et al. [11] where the units’ outcomes are viewed as averages of more granular study units, allowing for the construction of a synthetic control under specific restrictions on the model of granular study units’ outcomes.

In this work, we consider an alternative theoretical framework to formalize the synthetic control approach which obviates a specification of an IFEM. Specifically, we propose to view the synthetic control model from a measurement error perspective, whereby donor units’ outcomes stand as error-prone proxy measurements of the treated unit’s treatment-free potential outcome. In this framework, a synthetic control outcome can be obtained via a simple form of calibration, say a linear combination of donor units, so that on average, it matches the treated unit’s outcome in the pretreatment period. Although the standard IFEM views the treated and control units’ outcomes as proxies of latent factors, our approach views donor units’ outcomes as direct proxies of the treated unit’s treatment-free potential outcome. Thus, the proposed framework shares similarity with the recent proximal synthetic control framework of Shi et al. [8], which also formalizes donor outcomes as the so-called outcome proxies. However, a major distinction is that the latter requires an additional group of proxies (so-called treatment proxies) to identify synthetic control weights; in contrast, our proposed approach relies on a single type of proxies, given by donor units and obviates the need to evoke existence of latent factors.

Interestingly, similar to the connection between the proximal synthetic control approach of Shi et al. [8] and proximal causal inference for independent and identically distributed (i.i.d.) data [12,13], the proposed synthetic control framework is likewise inspired by the control outcome calibration approach [14] and its recent generalization to the so-called single proxy control framework [15] both of which were proposed for i.i.d. samples subject to an endogenous treatment assignment mechanism. Therefore, we aptly refer to our approach as a single proxy synthetic control (SPSC) approach. Despite this connection, the synthetic control generalization presents several new challenges related to (i) only observing a single treated unit, and therefore, treatment assignment is implicitly conditioned on, and (ii) having access to pre- and posttreatment time series data for a heterogeneous pool of untreated donor units, none of which can serve as a natural control; and (iii) serial correlation and heteroskedasticity due to the time series nature of the data. We tackle each of challenges (i)–(iii) in turn and develop a general framework for single proxy control in a synthetic control setting. The proposed method is implemented in an R package available at https://github.com/qkrcks0218/SPSC.

2 Setup and review of existing synthetic control frameworks

2.1 Setup

Let us consider a setting where N + 1 units are observed over T time periods. Units and time periods are indexed by i ∈ { 0 , 1 , … , N } and t ∈ { 1 , … , T } , respectively. Following the standard synthetic control setting, we suppose that only the first unit with index i = 0 is treated, whereas the latter N units with index i ∈ { 1 , … , N } are untreated control units; these untreated control units are also referred to as donors. Consider a binary treatment indicator A t which encodes whether time t is in the pretreatment period, in which case A t = 0 for t ∈ { 1 , … , T 0 } , or the posttreatment period, in which case A t = 1 for t ∈ { T 0 + 1 , … , T } , respectively. Thus, T 0 is the number of pretreatment periods and T 1 = T − T 0 is the number of posttreatment periods. Unless otherwise stated, we assume that N is fixed and T 0 and T 1 are large with similar order of magnitude. Let Y t and W i t denote observed outcomes of the treated unit and the i th control unit, respectively, for i ∈ { 1 , … , N } . We define W t = ( W 1 t , … , W N t ) ⊺ ∈ R N as the N -dimensional vector of the untreated units’ outcome at time t . We define O t = ( Y t , W t ⊺ , A t ) as the observed data at time t . Let Y t ( a ) and W i t ( a ) denote the potential outcomes of the treated and i th control units, respectively, which one would have observed had, possibly contrary to fact, the treatment been set to A t = a at time t .

For illustrative purposes, we will consider the following two examples throughout:

Example 1

Abadie et al. [2] investigated the effects of Proposition 99, a tobacco control program implemented in California in 1988, on cigarette sales in the state. Their empirical analysis considered annual cigarette sales data from California and from N = 38 other states, corresponding to Y t and W t , respectively. The potential outcome Y t ( 0 ) represents California’s cigarette sales had Proposition 99 not been implemented. The data covered the period from 1970 to 2000, resulting in T 0 = 29 pretreatment and T 1 = 12 posttreatment time periods.

Example 2

In Section 5, we revisited the analysis by Fohlin and Lu [16] to study the effects of the Panic of 1907 [17] on the average log stock prices of two trust companies (Knickerbocker and Trust Company of America) that were hypothesized to have been impacted by the Panic. For comparison, a selection of N = 49 trust companies conjectured to be immune to the Panic served as potential control units. The log stock price of these trust companies defines Y t and W t , respectively. The potential outcome Y t ( 0 ) represents the average log prices of Knickerbocker and Trust Company of America had the Panic of 1907 not occurred. The tri-weekly panel data consist of T 0 = 217 pretreatment and T 1 = 167 posttreatment time periods, respectively.

Throughout, let 1 ( ℰ ) denote the indicator function of an event ℰ , i.e., 1 ( ℰ ) = 1 if ℰ is satisfied and 1 ( ℰ ) = 0 otherwise. Let R be the set of real numbers. Let V 1 ⊥ ⊥ V 2 ∣ V 3 denote that V 1 and V 2 are conditionally independent given V 3 . Conversely, we use V 1 ⊥̸ ⊥ V 2 ∣ V 3 to denote that V 1 and V 2 are conditionally dependent given V 3 . Let 0 p × d , 1 p × d , and I p × p denote the ( p × d ) -dimensional zero matrix, ( p × d ) -dimensional matrix with ones, and ( p × p ) -dimensional identity matrix, respectively.

2.2 Review of existing synthetic control framework

A common target estimand in the synthetic control setting is the average treatment effect on the treated unit (ATT) at time t in the posttreatment periods, i.e.,

τ t * = E { Y t ( 1 ) − Y t ( 0 ) } , t ∈ { T 0 + 1 , … , T } .

Note that, by definition, Y t ( 1 ) − Y t ( 0 ) = τ t * + ν t for t ∈ { T 0 + 1 , … , T } , where ν t is a mean-zero idiosyncratic residual error and, therefore, τ t * may be viewed as a deterministic function of time capturing the expected effect of the treatment experienced by the treated unit if one were to average over the residual ν t . In Section 3.5, we describe an approach for constructing prediction intervals for Y t ( 1 ) − Y t ( 0 ) by appropriately accounting for the idiosyncratic error term ν t . To proceed, we make the consistency assumption:

Assumption 2.1

(Consistency) Y t = Y t ( A t ) almost surely and W i t = W i t ( A t ) almost surely for all i ∈ { 1 , … , N } and t ∈ { 1 , … , T } .

In addition, we assume no interference, i.e., the treatment has no causal effect on control units.

Assumption 2.2

(No interference on control units) W i t ( 0 ) = W i t ( 1 ) almost surely for all i ∈ { 1 , … , N } and t ∈ { 1 , … , T } .

In the context of Example 1, Assumption 2.2 means that Proposition 99 does not have a causal effect on other states’ cigarette sales; a similar interpretation applies to Example 2.

Under Assumptions 2.1 and 2.2, we have the following result almost surely for t ∈ { 1 , … , T } :

Y t = Y t ( 0 ) ( 1 − A t ) + Y t ( 1 ) A t , W i t = W i t ( 0 ) = W i t ( 1 ) , i ∈ { 1 , … , N } .

Therefore, for the posttreatment period, Y t ( 1 ) matches the observed outcome Y t while Y t ( 0 ) is unobserved, implying that an additional assumption is required to establish identification of the ATT.

In the classical synthetic control setting, a further assumption relates the observed outcomes of the untreated units with the treatment-free potential outcome of the treated unit. Specifically, following Abadie et al. [2] and Ferman and Pinto [6], suppose that units’ outcomes are generated from the following IFEM [9] for t ∈ { 1 , … , T } :

(1) Y t = τ t * A t + μ 0 ⊺ λ t + e 0 t , E ( e 0 t ∣ λ t ) = 0 W i t = μ i ⊺ λ t + e i t , E ( e i t ∣ λ t ) = 0 , i ∈ { 1 , … , N } .

Here, τ t * is the fixed, nonrandom treatment effect at time t , λ t ∈ R r is a random r -dimensional vector of latent factors that are known a priori to causally impact the treated and donor units, despite being unobserved, and can potentially be nonstationary over time, μ i ∈ R r is a time-fixed r -dimensional vector of unit-specific factor loadings, and e i t is a random error. For identification, it is typically assumed that the number of latent factors r is no larger than the number of donor units N and the pretreatment period length T 0 . Combined with Assumptions 2.1 and 2.2, IFEM (1) implies Y t ( 0 ) = μ 0 ⊺ λ t + e 0 t and Y t ( 1 ) = τ t * A t + Y t ( 0 ) , where the ATT is represented as τ t * = Y t ( 1 ) − Y t ( 0 ) for t ∈ { T 0 + 1 , … , T } ; note that Y t ( 1 ) − Y t ( 0 ) is nonrandom under model (1). In addition, if there were a donor whose factor loading matched that of the treated unit, i.e., μ i = μ 0 for some i ∈ { 1 , … , N } , then W i t would be unbiased for Y t ( 0 ) and, therefore, Y t ( 1 ) − W i t would be unbiased for the ATT. This suggests that confounding bias of the treatment effect on the treated unit’s outcome reflects the extent to which donors’ factor loadings differ from the treated unit’s.

Next, following Ferman and Pinto [6] and Shi et al. [8], suppose that a set of weights γ † = ( γ 1 † , … , γ N † ) ⊺ satisfies

(2) μ 0 = ∑ i = 1 N γ i † μ i .

Equations (1) and (2) imply that there exists a synthetic control W t ⊺ γ † = ∑ i = 1 N γ i † W i t satisfying

(3) Y t ( 0 ) = W t ⊺ γ † + e 0 t − ∑ i = 1 N γ i † e i t , t ∈ { 1 , … , T } .

In the context of Example 1, equation (3) means that:

(4) The counterfactual measurement of cigarette sales for California had, contrary to fact, Proposition 99 not been implemented is an error-prone weighted average of cigarette sales in the other 38 states.

A similar interpretation holds for Example 2. Therefore, τ t * = E { Y t ( 1 ) − W t ⊺ γ † } for t ∈ { T 0 + 1 , … , T } , i.e., Y t − W t ⊺ γ † is unbiased for the ATT. Unfortunately, it is impossible to obtain γ † from equation (2) because the factor loadings μ i are unknown. Importantly, the synthetic control weights satisfying (2) naturally accommodate an imperfect pretreatment fit as shown in (3), i.e., the synthetic control can significantly deviate from the observed pretreatment fit; however, the corresponding error is mean zero.

Based on (3), one may consider estimating γ † via penalized least squares minimization, say:

(5) γ ^ PLS = arg min γ 1 T 0 ∑ t = 1 T 0 ( Y t − W t ⊺ γ ) 2 + ℛ ( γ ) ,

where ℛ ( γ ) is a penalty that constraints γ . For instance, Abadie et al. [2] restricts the weight to lie within a simplex, meaning that they are nonnegative and sum to one, Doudchenko and Imbens [18] uses elastic-net penalization, and Robbins et al. [19] uses entropy penalization. In words, γ ^ PLS is obtained by fitting a possibly constrained ordinary least squares (OLS) regression of Y t on W i t . Importantly, without penalization, the moment restriction solving (5) reduces to E { Ψ OLS ( O t ; γ ) } = 0 for t ∈ { 1 , … , T 0 } , where Ψ OLS ( O t ; γ ) = W t ( Y t − W t ⊺ γ ) are standard least squares normal equations.

However, as discussed by Ferman and Pinto [6] and Shi et al. [8], the OLS weights obtained from (5) are generally inconsistent under (2) as T 0 tends to infinity, which can result in biased estimation of the treatment effect unless e i t is exactly zero for all i and t ; see Supplementary Material S1.1 for details. We remark that this result does not conflict with Abadie et al. [2] because their synthetic control weights are assumed to satisfy a perfect pretreatment fit; specifically, there exist values γ # = ( γ 1 # , … , γ N # ) ⊺ satisfying

(6) Y t ( 0 ) = W t ⊺ γ # , t ∈ { 1 , … , T 0 } .

In the context of Example 1, equation (6) means that:

(7) The counterfactual measurement of cigarette sales for California had, contrary to fact, Proposition 99 not been implemented is equal to weighted averages of cigarette sales in the other 38 states.

Example 2 follows a similar interpretation. Note that (6) is distinct from condition (2) of Ferman and Pinto [6] and Shi et al. [8], as reflected in their interpretations (4) and (7). Moreover, as discussed in Ferman and Pinto [6], (6) can be expected to hold approximately under (2) when the variance of the error e i t in (1) becomes negligible as T 0 becomes large. Specifically, in a noiseless setting where e i t = 0 almost surely for all i ∈ { 0 , 1 , … , N } , (1) and (2) imply (6) because (3) becomes equivalent to (6); see Abadie et al. [2] for related results, and Sections 1 and 3.1 of Ferman and Pinto [6], and Section 2 Shi et al. [8] for detailed discussions.

Recently, Shi et al. [8] introduced a proximal causal inference framework for synthetic controls. Specifically, they assume that they have also observed proxy variables Z t = ( Z 1 t , … , Z M t ) ⊺ a priori known to satisfy the following condition in the pretreatment period:

(8) Z t ⊥ ⊥ ( Y t , W t ) ∣ λ t , t ∈ { 1 , … , T 0 } .

A reasonable candidate for Z t maybe the outcome of units excluded from the donor pool; see Shi et al. [8] for alternative choices of proxies. Then, under Assumptions 2.1 and 2.2, the IFEM (1), condition (2), and the existence of proxies satisfying (8), the synthetic control weights γ † in (2) satisfy E ( Y t − W t ⊺ γ † ∣ Z t ) = 0 and E { Ψ PSC ( O t ; γ † ) } = 0 for t ∈ { 1 , … , T 0 } where Ψ PSC ( O t ; γ ) = g ( Z t ) ( Y t − W t ⊺ γ ) ; here, g is a user-specified function of Z t with dim ( g ) ≥ d . Based on this second result, one can estimate the synthetic control weights as the solution to the generalized method of moments (GMM) [20], i.e., γ ^ PSC = ( γ ^ PSC , 1 , … , γ ^ PSC , N ) ⊺ is the minimizer of T 0 − 1 ∑ t = 1 T 0 { Ψ PSC ( O t ; γ ) } ⊺ Ω ^ { Ψ PSC ( O t ; γ ) } where Ω ^ is a user-specified symmetric and positive-definite weight matrix. Importantly, in contrast to the OLS-based estimator γ ^ PLS in (5), the proximal estimator γ ^ PSC is consistent for γ † . Under certain regularity conditions, Shi et al. [8] established that the resulting GMM estimator of the ATT is consistent and asymptotically normal. For instance, in the special case of constant ATT, i.e., τ t * = τ * for all t ∈ { T 0 + 1 , … , T } , the estimator T 1 − 1 ∑ t = T 0 + 1 T ( Y t − W t ⊺ γ ^ PSC ) is consistent for τ * ; see Section 3.2 of Shi et al. [8] for details.

3 SPSC approach

3.1 Assumptions

In this section, we provide a novel synthetic control approach which obviates the need for an IFEM, and, in fact, does not necessarily postulate the existence of a latent factor λ t . At its core, the approach views the outcomes of the untreated units W i t as proxies for the treatment-free potential outcome of the treated unit Y t ( 0 ) , which is formally stated as follows:

Assumption 3.1

(Proxy) There exists a function h * : R N → R satisfying

h * ( W t ) ⊥̸ ⊥ Y t ( 0 ) , t ∈ { 1 , … , T 0 } .

Assumption 3.1 encodes that a function of the untreated units’ outcomes W t is associated with and, therefore, predictive of Y t ( 0 ) at time t ∈ { 1 , … , T 0 } . In terms of Example 1, Assumption 3.1 means that there exists a function of 38 states’ cigarette sales that is associated with cigarette sales in counterfactual California where Proposition 99 was not implemented; a similar interpretation also applies to Example 2. Note that Assumption 3.1 allows for the existence of irrelevant donors among the donor pool, i.e., some untreated units can be independent of Y t ( 0 ) as long as the remaining untreated units are associated with the latter. Additionally, we make the following assumption for h * :

Assumption 3.2

(Existence of a synthetic control bridge function) For all t ∈ { 1 , … , T } , there exists a synthetic control bridge function h * : R N → R satisfying

(9) Y t ( 0 ) = E { h * ( W t ) ∣ Y t ( 0 ) } almost surely .

Assumption 3.2 is the key identification assumption of the SPSC framework. It posits the existence of a synthetic control h * ( W t ) that is conditionally unbiased for Y t ( 0 ) . In words, there exists a function of donors h * , possibly nonlinear, whose conditional expectation given Y t ( 0 ) recovers Y t ( 0 ) ; the function h * is a kind of bridge functions [12,13], and we aptly refer to h * as a synthetic control bridge function in this paper. The synthetic control bridge function h * is a solution to the Fredholm integral equation of the first kind (9), and sufficient conditions for the existence of a solution are well-studied in previous related works developed under i.i.d. settings such as Miao et al. [12] and Cui et al. [21]; see Supplementary Material S2.2 for details. Importantly, Assumption 3.2 may still hold in non-i.i.d. settings, such as when ( Y t ( 0 ) , W t ) is nonstationary; see Supplementary Material S1.5 for further details.

In particular, if h * has a linear form, say h * ( W t ) = W t ⊺ γ * for some γ * ∈ R N , the assumption implies the following linear model with an error e ¯ t :

(10) W t ⊺ γ * = Y t ( 0 ) + e ¯ t , E { e ¯ t ∣ Y t ( 0 ) } = 0 almost surely for all t ∈ { 1 , … , T } .

Regression model (10) essentially implies that Y t ( 0 ) falls in the linear span of E { W t ∣ Y t ( 0 ) } , up to a mean zero residual. Thus, Assumption 3.2 may be interpreted as follows for Example 1:

(11) There exists a weighted average of cigarette sales for the 38 donor states which constitutes an error-prone counterfactual measurement of cigarette sales for California had, contrary to fact, Proposition 99 not been implemented.

Assumption 3.2 plays an analogous role as condition (2) in Ferman and Pinto [6] and Shi et al. [8] and condition (6) in Abadie et al. [2] in that it establishes a relationship between Y t ( 0 ) and W t ; however, Assumption 3.2 is fundamentally different from these assumptions. In particular, condition (2) implies that the counterfactual outcome Y t ( 0 ) is equal to the synthetic control W t ⊺ γ * plus an error; in contrast, Assumption 3.2 with a linear h * implies that the synthetic control W t ⊺ γ * is equal to the couterfactual outcome Y t ( 0 ) plus a residual error. This distinction highlights that Assumption 3.2 and condition (2) can be viewed as reversed assumptions: they differ in which variable is treated as an error-prone version of the other. Finally, condition (6) is a special case of the former two cases where the residual error is assumed to be exactly zero, i.e., noiseless setting. Consequently, in the pretreatment periods, Assumption 3.2 is strictly weaker than condition (6) because e ¯ t is not necessarily zero.

Unlike condition (2), Assumption 3.2 obviates the need for latent factors, their corresponding factor loadings, IFEM (1), or any related latent factor models. Instead, Assumption 3.2 simply states that it is possible to construct a function of the control units’ outcomes h * ( W t ) , which is conditionally unbiased for the treatment-free potential outcome of the treated units Y t ( 0 ) , without requiring assumptions about how these outcomes are generated. From this viewpoint, h * in Assumption 3.2 serves as a bridge function relating W t and Y t ( 0 ) in that h * ( W t ) is an error-prone version of Y t ( 0 ) . This perspective can be illustrated in Example 1: cigarette sales in counterfactual California, had Proposition 99 not been implemented, are viewed as a variable a priori determined by an unknown mechanism, while cigarette sales in the other 38 states are seen as error-prone transformations of this counterfactual outcome. Then, Assumption 3.2 implies that cigarette sales in counterfactual California can be recovered by aggregating these latter variables up to a mean-zero error.

Moreover, this perspective aligns with existing statistical literature. In particular, model (10) is reminiscent of a nonclassical measurement model [22,23]. From a regression model perspective, the donors’ outcomes W t and the treated unit’s treatment-free potential outcome Y t ( 0 ) in model (10) can be viewed as dependent and independent variables, respectively. This may appear somewhat unconventional at first glance, as some previous synthetic control methods treat Y t ( 0 ) and W t as dependent and independent variables, respectively, in estimation of synthetic control weights. To be more precise, they use equation (5) to estimate the synthetic control weights by regressing Y t ( 0 ) on W t using standard ordinary (or weighted) least squares. However, as model (10) suggests, our framework is different from previous works in synthetic control and better aligned with regression calibration techniques in measurement error literature [22] in that we view the problem as the reverse regression model of W t on Y t ( 0 ) . From this perspective, synthetic control weights γ * are sought to make the weighted response W t ⊺ γ * as close as possible to the regressor Y t ( 0 ) .

To summarize, the SPSC framework differs from existing synthetic control frameworks in its identifying assumptions and interpretation of the synthetic control. Specifically, in the SPSC framework, the synthetic control is viewed as an error-prone outcome measurement (see (10)), eliminating the need for a generative model for Y t ( 0 ) . In contrast, existing approaches interpret the synthetic control as either the projection of the outcome onto the donor’s outcome space (see (3) and (5)) or the outcome itself (see (6)). Despite these differences, both frameworks share key similarities. In both frameworks, synthetic controls are constructed by weighting donor units to optimally match the treated unit during the pretreatment period, though the matching criteria differ, as previously noted. Furthermore, synthetic controls in both approaches serve as unbiased forecasts of the mean treatment-free potential outcome, E { Y t ( 0 ) } , enabling treatment effect estimation by comparing observed outcomes Y t to the synthetic controls over the posttreatment period. In addition, like other synthetic control methods, the SPSC framework accommodates time-varying confounders, distinguishing it from difference-in-differences approaches. Most importantly, the SPSC framework is compatible with the IFEM, as shown in the next section. Thus, while the interpretation of the synthetic control differs, most features of existing synthetic control approaches carry over to the SPSC framework.

3.2 A generative model

While, in principle, Assumptions 3.1 and 3.2 do not require a generative model, it is instructive to consider a model compatible with these assumptions. In this vein, suppose that Y t ( 0 ) and W i t are generated from the following nonparametric structural equation model [24] for t ∈ { 1 , … , T } :

(12) Y t ( 0 ) = f 0 ( λ t , e 0 t ) , W i t = f i ( λ t , e i t ) , i ∈ { 1 , … , N } .

Here, f 0 and f i are structural equations for Y t ( 0 ) and W i t , respectively, λ t = ( λ 1 t , … , λ r t ) ⊺ is an r -dimensional latent factor, and the errors satisfy e i t ⊥ ⊥ λ t for i ∈ { 0 , 1 , … , N } and e 0 t ⊥̸ ⊥ e i t , where the latter condition further strengthens Assumption 3.1 in the sense that W i t is relevant for Y t ( 0 ) even beyond λ t . Figure 1 provides graphical representations compatible with Assumption 3.1 and model (12).

Figure 1

Graphical illustrations for (a) Assumption 3.1, (b) model (12) with correlated errors, and (c) model (12) with independent errors. The dashed bow arcs depict the association between two variables. For illustration, we consider N = 1 .

Under model (12), Y t ( 0 ) is determined by λ t and e 0 t . Given this relationship, it is natural to consider a sufficient condition of Assumption 3.2 characterized in terms of λ t and e 0 t , say:

Condition 3.1

For all t ∈ { 1 , … , T } , there exists a function h * : R N → R that satisfies Y t ( 0 ) = f 0 ( λ t , e 0 t ) = E { h * ( W t ) ∣ λ t , e 0 t } almost surely.

Condition 3.1 is a sufficient condition for Assumption 3.2 because, under Condition 3.1, we obtain E { h * ( W t ) ∣ Y t ( 0 ) } = E [ E { h * ( W t ) ∣ λ t , e 0 t } ∣ Y t ( 0 ) ] = Y t ( 0 ) .

Under model (12) and Condition 3.1, consider the special case where f i is the IFEM (1):

(13) f i ( λ t , e i t ) = μ i ⊺ λ t + e i t = ∑ ℓ = 1 r μ ℓ i λ ℓ t + e i t , E ( e i t ) = 0 , E ( e i t ∣ e 0 t ) = ω i e 0 t , i ∈ { 0 , 1 , … , N } .

Here, ω i is a regression coefficient obtained from regressing the i th donor’s error e i t on the treated unit’s error e i t . We remark that ω 0 = 1 and ω i ≠ 0 for some i ∈ { 1 , … , N } , encoding e 0 t ⊥̸ ⊥ e i t . Under the IFEM, Condition 3.1 holds with h * ( W t ) = W t ⊺ γ * if γ * = ( γ 1 * , … , γ N * ) ⊺ solves the following linear system:

(14) Y t ( 0 ) = ∑ i = 1 N γ i * E ( W i t ∣ λ t , e 0 t ) ⇔ μ 10 ⋮ μ r 0 ω 0 = μ 11 μ 12 ⋯ μ 1 N ⋮ ⋮ ⋱ ⋮ μ r 1 μ r 2 ⋯ μ r N ω 1 ω 2 ⋯ ω N ︸ = A γ 1 * γ 2 * ⋮ γ N * .

A sufficient condition for the existence of the weight γ * is that the matrix A is of full row rank, which is satisfied under the following sufficient (but not necessary) conditions: (i) r < N , i.e., the number of donors N is greater than the number of latent factors, and (ii) the factor loadings μ i are linearly independent. If the matrix A is square and invertible, γ * is uniquely determined. This observation informs that a linear synthetic control satisfying Condition 3.1, and thus Assumption 3.2, is likely to exist when the errors are correlated, and there are sufficient number of donors, regardless of the distribution of the latent factors and errors.

Since equation (14) is based on the IFEM, it has interesting connections with previous works that also rely on this model. In order to elucidate these connections, we consider the following alternative representation of equation (14):

(15) μ ˜ 0 = ∑ i = 1 N γ i * μ ˜ i ,

where μ ˜ i = ( μ 1 i , … , μ r i , ω i ) ⊺ for i ∈ { 0 , 1 , … , N } . As the expression itself indicates, condition (15) is similar to condition (2), a condition used in Ferman and Pinto [6] and Shi et al. [8], but there is a notable difference between (2) and (15) in how they handle errors e i t . Specifically, in condition (15), one can address the residual errors e i t by accommodating the regression coefficients ω i as a component of the unit-specific factor loadings μ ˜ i . In contrast, condition (2) does not account for these errors. Consequently, (15) implies (2) because μ i is a subvector of μ ˜ i , indicating that (15) is a stronger condition than (2). However, as stated in Theorem 3.1 in Section 3.3, it is crucial to note that this stronger condition is offset by not requiring an additional condition for establishing identification of the synthetic control weight γ * . In other words, condition (15) alone is sufficient for identification of γ * . On the other hand, condition (2) fails to do so, necessitating additional assumptions for identification of γ * , as exemplified by Ferman and Pinto [6] and Shi et al. [8]. Specifically, Ferman and Pinto [6] requires either (i) Var ( e i t ) = 0 for all i ∈ { 0 , 1 , … , N } , meaning a noiseless setting, or (ii) γ * is a minimizer of V ( γ ) = E { ( e 0 t − ∑ i = 1 N γ i e i t ) 2 } , the variance of a linear combination of error terms appearing in (3); see Propositions 1 and 2 of Ferman and Pinto [6] for details. Interestingly, under (i), all ω i t can be taken as zero, and (15) becomes equivalent to (2), the assumption made by Ferman and Pinto [6] and Shi et al. [8]. Finally, in the degenerate case where Y t ( 0 ) and W t share the same error, i.e., e 0 t = e 1 t = ⋯ = e N t almost surely, condition (15) implies the perfect fit condition, i.e., condition (6), in which case the unconstrained OLS weights (5) are consistent as T 0 tends to infinity.

While the IFEM with correlated errors in (13) is useful for motivating the SPSC framework, the standard IFEM typically assumes no correlation among errors, i.e., ω i = 0 for all i ∈ { 1 , … , N } in (13). When the errors are uncorrelated, the solution to equation (14) may not exist, implying that no linear SPSC bridge function satisfies Condition 3.1. This may suggest that the SPSC framework may not be compatible with a standard IFEM. However, a linear synthetic control bridge function satisfying Assumption 3.2 may still exist under the IFEM with uncorrelated errors, while Condition 3.1 is violated; this is because Condition 3.1 is not a necessary condition of Assumption 3.2. With additional assumptions regarding the latent factors and errors, it is possible to conceive of a reasonable scenario where the SPSC framework remains valid within the standard IFEM. For instance, if λ t and e t = ( e 0 t , e 1 t , … , e N t ) ⊺ follow multivariate normal distributions with homoskedastic variances, specifically λ t ∼ N r ( ν t , Σ λ ) and e t ∼ N N + 1 ( 0 ( N + 1 ) × 1 , Σ e ) , then a linear synthetic control satisfying Assumption 3.2 exists even when Σ e is a diagonal matrix; see Supplementary Material S1.5 for details. In essence, such circumstances may arise because, despite the uncorrelated errors, W t and Y t ( 0 ) remain associated through the latent factors λ t , allowing for the possibility of a linear SPSC to exist; see Figure 1 (c) for a graphical illustration. In summary, while uncorrelated errors may undermine the plausibility of the SPSC framework, it can still be valid if certain conditions on ( λ t , e t ) are met such as the normality assumption.

3.3 Identification of the synthetic control and the treatment effect

As a direct consequence of Assumptions 2.1, 2.2, 3.1, and 3.2, the synthetic control bridge function h * can be represented as a solution to the moment equation given in the following result:

Theorem 3.1

Under Assumptions 2.1, 2.2, 3.1, and 3.2, the synthetic control bridge function h * satisfy E { h * ( W t ) ∣ Y t } = Y t almost surely for t ∈ { 1 , … , T 0 } .

The proof of the theorem, as well as all other proofs, are provided in Supplementary Material S3. Theorem 3.1 motivates our approach for estimating the synthetic control bridge function h * , as it only involves the observed data. Another consequence of Assumptions 2.1, 2.2, 3.1, and 3.2, is that, as formalized in Theorem 3.2 below, the synthetic control bridge function h * ( W t ) can be used to identify τ t * :

Theorem 3.2

Under Assumptions 2.1, 2.2, 3.1, and 3.2, we have that E { Y t ( 0 ) } = E { h * ( W t ) } for any t ∈ { 1 , … , T } . In addition, the ATT is identified as τ t * = E { Y t − h * ( W t ) } for t ∈ { T 0 + 1 , … , T } .

Theorem 3.2 provides a theoretical basis for the use of the synthetic control method to estimate the ATT. Specifically, following Abadie and Gardeazabal [1] and Shi et al. [8], we use Y t − h * ( W t ) in a standard time series regression where the ATT is identified as the deterministic component of the decomposition Y t − h * ( W t ) = τ t * + ε t , with ε t representing a mean-zero error. The following sections elaborate on this approach, first describing how the identification result leads to an estimator of the synthetic control.

To facilitate the exposition, hereafter in the main text, we restrict attention to inference under a linear bridge function, i.e., h * ( W t ) = W t ⊺ γ * , while allowing for the possibility for γ * not to be unique. In Supplementary Material S2, we present the more general case where h * is nonparametric.

3.4 Estimation and inference of the treatment effect under a linear bridge function

We first discuss estimation of the synthetic control weights γ * . We consider the following time-invariant estimating function for the pretreatment periods:

(16) Φ pre ( O t ; γ ) = ϕ ( Y t ) ( Y t − W t ⊺ γ ) , t ∈ { 1 , … , T 0 } .

Here, ϕ : R → R p is a p -dimensional user-specified function of the treated unit’s outcome. Theorem 3.1 implies that the estimating function Φ pre satisfies E { Φ pre ( O t ; γ * ) } = 0 for t ∈ { 1 , … , T 0 } , indicating that the estimating function Φ pre can be used to obtain an estimator of γ * . An important remark on ϕ is that the dimension of ϕ can be smaller than the number of donors, i.e., p < N . Therefore, ϕ can be specified as a simple function, e.g., ϕ ( y ) = y .

It is instructive to note that solving the estimating equation E { Φ pre ( O t ; γ ) } = 0 has a close connection to performing an instrumental variable regression. To illustrate this, consider a simple setting where ϕ ( y ) = y and N = 1 , along with an alternative form of model (10) for t ∈ { 1 , … , T 0 } :

(17) W t γ * = Y t + e ¯ t ⇔ Y t = W t γ * − e ¯ t , E ( e ¯ t ∣ Y t ) = 0 almost surely .

One might attempt to interpret the model on the right-hand side as a standard regression model, treating Y t as the response variable and W t as the explanatory variable. However, such an interpretation would not be correct, as the error term − e ¯ t is orthogonal to the response variable Y t . Instead, the right-hand side model exhibits the following properties: (i) the error term − e ¯ t is correlated with W t (as induced from the left-hand side model), making W t an endogenous explanatory variable on the right-hand side model; (ii) the error − e ¯ t is orthogonal to Y t ; and (iii) Y t is correlated with W t under Assumption 3.1. Thus, Y t can serve as an instrumental variable for W t , allowing for an instrumental variable regression estimator, where Y t and W t are used as the instrument and the endogenous explanatory variable, respectively. This estimator is given by γ ^ IV = ( T 0 − 1 ∑ t = 1 T 0 Y t W t ) − 1 ( T 0 − 1 ∑ t = 1 T 0 Y t 2 ) . Notably, γ ^ IV is consistent for γ * = { T 0 − 1 ∑ t = 1 T 0 E ( Y t W t ) } − 1 { T 0 − 1 ∑ t = 1 T 0 E ( Y t 2 ) } under some conditions, which is the solution to the estimating equation E { Φ pre ( O t ; γ ) } = 0 . The case for a general ϕ and multiple donors can be understood in a similar manner, with the main difference being the use of multiple instrumental variables ϕ ( Y t ) ∈ R p and multiple explanatory variables W t ∈ R N .

The choice of ϕ affects the efficiency of the corresponding estimator of γ * and the treatment effect parameter β * , which we later define in this section; see Section 2 of Donald et al. [25] for a similar discussion. Therefore, one could theoretically select the optimal ϕ from a set of candidates that minimizes the asymptotic variance of the estimators, thereby maximizing efficiency. For example, ϕ can be selected from basis functions such as polynomials up to the p th power, where p is determined to minimize the asymptotic variance of the estimators of ( γ * , β * ) ; other examples of basis functions include truncated polynomial bases, Fourier basis functions, splines, or wavelets such as the Haar basis; see Chen [26] the references therein for more details on how to choose the optimal ϕ over a basis function space. However, selecting the optimal ϕ can be computationally intensive, and despite this burden, it may yield only marginal gains in efficiency. From a practical standpoint, we use a simple specification for ϕ , namely, the identity function ϕ ( y ) = y , leading to p = 1 . In the simulation studies and data analysis, this simple choice of ϕ performs well and produces reasonable results compared to competing methods in settings we consider, although we cannot guarantee this to be the case in all settings one might face in practice.

A time-invariant specification of ϕ may sometimes lead to poorly behaved estimates of synthetic control weights, particularly in scenarios where the outcomes exhibit nonstationary behavior. To address this, the estimating function can be adapted to accommodate secular trends as follows:

(18) Ψ pre ( O t ; η , γ ) = D t ( Y t − D t ⊺ η ) g ( t , Y t ; η ) ( Y t − W t ⊺ γ ) , g ( t , y ; η ) = D t ϕ ( y − D t ⊺ η ) , t ∈ { 1 , … , T 0 } .

Here, D t ∈ R d is a d -dimensional vector of basis functions to de-trend nonstationary behaviors of the outcomes. We assume that D t is selected such that there exists a unique vector η * satisfying E { Y t ( 0 ) } = D t ⊺ η * for t ∈ { 1 , … , T 0 } , meaning that the time trend of Y t ( 0 ) over the pretreatment period is correctly specified by the regression model spanned by D t . The selection of D t can be evaluated by examining the residuals from regressing Y t on D t over the pretreatment period. For instance, to account for a linear trend, one might select D t = ( 1 , t ⁄ T 0 ) ⊺ , where these terms account for an intercept and the drift of a nonstationary process. Alternatively, one could choose D t = ℬ d ( t ) , the d -dimensional cubic B-spline function, to capture nonlinear trends. While D t could also be specified as a dummy vector – allowing each component of η * to represent time fixed effects for each pretreatment period – this approach may result in an inconsistent ATT estimator. To ensure valid inference of the ATT while reducing the risk of misspecification, we recommend using a cubic B-spline basis of small to moderate dimension. In both our simulation studies and real-world analysis (Sections 4 and 5), we used a six-dimensional cubic B-spline basis, which demonstrated reasonable performance.

The function g : [ 0 , ∞ ) ⊗ R → R d + p can be seen as a basis function for both outcome and time period. Note that the dimension of g may be smaller than N , which may arise from simple specifications of D t and ϕ . For instance, we specify ϕ ( y ) = y and D t = ( 1 , t ⁄ T 0 ) ⊺ , the dimension of g is then equal to three, which may be substantially smaller than the number of untreated units N .

The time-varying estimating function Ψ pre satisfies E { Ψ pre ( O t ; η * , γ * ) } = 0 for t ∈ { 1 , … , T 0 } under Assumptions 2.1, 2.2, 3.1, and 3.2. This ensures that an estimator of γ * can be obtained by using the time-varying estimating function Ψ pre rather than the time-invariant estimating function Φ pre . In fact, incorporating the time-varying term can potentially enhance the finite sample performance of the proposed estimator in the presence of nonstationary behavior. For instance, under the IFEM (13), we show that incorporating time-varying components reduces the bias of the estimator of γ * when the latent factor λ t exhibits a secular trend; see Supplementary Material S1.6 for this result. Moreover, simulation studies in Section 4 suggest that including time-varying components can help reduce bias in the presence of a time trend. Therefore, in the remainder of the paper, we use the time-varying estimating function Ψ pre unless stated otherwise.

An estimator of γ * can in principle be obtained based on the empirical counterpart of the moment condition. Because the dimension of Ψ pre is allowed to be smaller than the dimension of ( η * , γ * ) , the standard GMM theory [20] does not readily apply; typically, standard GMM requires the number of moment equations to be greater or equal to the number of unknown parameters. To regularize the problem, we include a ridge penalty term for γ * . Specifically, the regularized GMM estimator γ ^ ρ with regularization penalty ρ ∈ ( 0 , ∞ ) is defined as the solution to the following minimization problem:

(19) ( η ^ , γ ^ ρ ) = arg min ( η , γ ) [ { Ψ ^ pre ( η , γ ) } ⊺ Ω ^ pre { Ψ ^ pre ( η , γ ) } + ρ ∥ γ ∥ 2 2 ] .

Here, Ψ ^ pre ( η , γ ) = T 0 − 1 ∑ t = 1 T 0 Ψ pre ( O t ; η , γ ) is the empirical mean of the estimating function over the pretreatment periods evaluated at ( η , γ ) . Also, Ω ^ pre = diag ( I d × d , Ω ^ g ) ∈ R ( 2 d + p ) × ( 2 d + p ) is a user-specified symmetric positive definite block-diagonal matrix with Ω ^ g ∈ R ( d + p ) × ( d + p ) , which can simply be set to the identity matrix. Since the first block of Ω ^ pre is the identity matrix, η ^ reduces to the OLS estimator, i.e., η ^ = ( ∑ t = 1 T 0 D t D t ⊺ ) − 1 ( ∑ t = 1 T 0 D t Y t ) .

Equations (18) and (19) fortunately admit closed-form solutions. For instance, if Ω ^ pre is the identity matrix, we have γ * = G Y W * + G Y Y * + ζ Y W and γ ^ ρ = ( G ^ Y W ⊺ G ^ Y W + ρ I N × N ) − 1 G ^ Y W ⊺ G ^ Y Y , where

(20) G Y W * = 1 T 0 ∑ t = 1 T 0 E { g ( t , Y t ; η * ) W t ⊺ } ∈ R ( d + p ) × N , G Y Y * = 1 T 0 ∑ t = 1 T 0 E { g ( t , Y t ; η * ) Y t } ∈ R d + p , G ^ Y W = 1 T 0 ∑ t = 1 T 0 g ( t , Y t ; η ^ ) W t ⊺ ∈ R ( d + p ) × N , G ^ Y Y = 1 T 0 ∑ t = 1 T 0 g ( t , Y t ; η ^ ) Y t ∈ R d + p .

Here, M + denotes the Moore-Penrose inverse of a matrix M , and ζ Y W is an arbitrary vector in the null spaces of G Y W * , i.e., G Y W * ζ Y W = 0 . In general, γ * may not be unique unless G Y W * is of full column rank, in which case γ * is uniquely determined by γ * = ( G Y W * ⊺ G Y W * ) − 1 G Y W * ⊺ G Y Y * . However, when the number of untreated units N is large, a common scenario in many synthetic control settings, the full column rank condition of G Y W * may not be met, making γ * not unique.

A special instance of γ * is the minimum-norm solution, denoted by γ 0 * = G Y W * + G Y Y * , which corresponds to ζ Y W = 0 . Even if γ * is not unique, γ 0 * remains unique. Moreover, under certain conditions, γ ^ ρ is consistent for γ 0 * as the number of pretreatment periods T 0 goes to infinity and the regularization parameter ρ decreases at a sufficiently fast rate. In other words, γ ^ ρ uniquely converges to γ 0 * , allowing us to rely on standard GMM theory as if γ 0 * were the unique solution to the estimating equation. Consequently, we can infer the treatment effect based on the synthetic control with estimated weights γ ^ ρ .

Once the synthetic control weights are estimated, one could in principle estimate the treatment-free potential outcome and the ATT as Y ^ t ( 0 ) = W t ⊺ γ ^ ρ and τ ^ t = Y t − W t ⊺ γ ^ ρ , respectively, for t ∈ { T 0 + 1 , … , T } . Unfortunately, without additional assumptions, it is impossible to perform inference of the ATT τ t * based on τ ^ t because the latter will generally fail to be consistent given that we only have access to one observation for each t . An alternative is to infer the random treatment effects ξ t * = Y t ( 1 ) − Y t ( 0 ) based on pointwise prediction intervals, obviating the need for consistency of τ ^ t ; see Section 3.5 for details. However, for the remainder of this section, we maintain our focus on inference about the ATT.

We posit a parsimonious working model for the ATT as a function of time. Specifically, we assume that the ATT follows a model indexed by a b -dimensional parameter β via a function τ ( ⋅ ; ⋅ ) : [ 0 , ∞ ) ⊗ R b → R . Let β * ∈ R b be the true parameter satisfying τ t * = τ ( t ; β * ) for t ∈ { 1 , … , T } . This parametrization allows us to pool information over time in the posttreatment period to infer β * and the ATT. Possible forms for τ ( t ; β ) are given as follows:

Example 3

(Constant effect) τ ( t ; β ) = β ; this model is reasonable if the treatment yields an immediate, short-term effect which persists over a long period of time.

Example 4

(Linear effect) τ ( t ; β ) = β 0 + β 1 ( t − T 0 ) + ⁄ T 1 where ( c ) + = max ( c , 0 ) for a constant c ; this model is appropriate if the treatment yields a gradual, increasing effect over time.

Example 5

(Nonlinear effect) This includes a quadratic model τ ( t ; β ) = β 0 + β 1 ( t − T 0 ) + ⁄ T 1 + β 2 ( t − T 0 ) + 2 ⁄ T 1 , or an exponentially time-varying treatment model τ ( t ; β ) = exp { β 0 + β 1 ( t − T 0 ) + ⁄ T 1 } , or a model spanned by nonlinear basis functions, e.g., τ ( t ; β ) = ℬ b ⊺ ( t ) β , where ℬ b ( t ) is the b -dimensional cubic B-spline function; this model is appropriate if the treatment yields a nonlinear effect over time.

For tractable inference, we assume that the error process is weakly independent, which is formally stated as follows:

Assumption 3.3

(Weakly dependent error) Let ε t = Y t − W t ⊺ γ 0 * − τ ( t ; β * ) for t ∈ { 1 , … , T } . Then, the error process { ε 1 , … , ε T } is weakly dependent, i.e., corr ( ε t , ε t + t ′ ) converges to 0 as t ′ → ± ∞ .

Assumption 3.3 applies to many standard time series models, including autoregressive models, moving-average models, and autoregressive moving-average models.

Along with these conditions, we will consider an asymptotic regime where T 0 , T 1 → ∞ and T 1 ⁄ T 0 → r ∈ ( 0 , ∞ ) . Specifically, let Ψ ( O t ; η , γ , β ) be the following ( 2 d + p + b ) -dimensional estimating function:

Ψ ( O t ; η , γ , β ) = Ψ pre ( O t ; η , γ ) Ψ post ( O t ; γ , β ) = ( 1 − A t ) D t ( Y t − D t ⊺ η ) ( 1 − A t ) g ( t , Y t ; η ) ( Y t − W t ⊺ γ ) A t ∂ τ ( t ; β ) ∂ β { Y t − W t ⊺ γ − τ ( t ; β ) } ∈ R 2 d + p + b .

Then, GMM estimators of the synthetic control weights and treatment effect parameter are obtained as the solution to the following minimization problem:

( η ^ , γ ^ ρ , β ^ ) = arg min ( η , γ , β ) [ { Ψ ^ ( η , γ , β ) } ⊺ Ω ^ { Ψ ^ ( η , γ , β ) } + ρ ∥ γ ∥ 2 2 ] ,

where Ψ ^ ( η , γ , β ) = T − 1 ∑ t = 1 T Ψ ( O t ; η , γ , β ) is the empirical mean of the estimating function and Ω ^ ∈ R ( 2 d + p + b ) × ( 2 d + p + b ) a user-specified symmetric positive definite block-diagonal matrix as Ω ^ = diag ( Ω ^ pre , Ω ^ post ) ; for simplicity, Ω ^ can be chosen as the identity matrix.

Under our assumptions, the following result establishes that ( η ^ , γ ^ ρ , β ^ ) is asymptotically normal when the number of time periods goes to infinity and the regularization parameter diminishes at o ( N − 1 ⁄ 2 ) rate:

Theorem 3.3

Suppose that Assumptions 2.1, 2.2, 3.1, 3.2, 3.3, and regularity conditions in Supplementary Material S3.2 hold. Then, as T → ∞ and ρ = o ( N − 1 ⁄ 2 ) , we have

T η ^ γ ^ ρ β ^ − η * γ 0 * β * converges i n d i s t r i b u t i o n t o N ( 0 , Σ * ) ,

where Σ * = Σ 1 * Σ 2 * Σ 1 * ⊺ is given by

Σ 1 * = Ω * 1 ⁄ 2 lim T → ∞ ∂ E { Ψ ^ ( η , γ , β ) } ∂ ( η , γ , β ) ⊺ η = η * , γ = γ 0 * , β = β * + Ω * 1 ⁄ 2 , Σ 2 * = lim T → ∞ Var { T ⋅ Ψ ^ ( η * , γ 0 * , β * ) } .

Here, Ω * 1 ⁄ 2 is a symmetric positive-definite matrix satisfying ( Ω * 1 ⁄ 2 ) 2 = lim T → ∞ Ω ^ .

Note that Σ * is rankdeficient if the dimension of g is smaller than N . In this case, the asymptotic distribution is a degenerate normal distribution. However, this degeneracy only impacts the synthetic control weight estimator γ ^ ρ . Therefore, the asymptotic variance of β ^ remains full rank, even in this case, ensuring that inference regarding β * remains valid. For inference about β * , we propose to use the ( b × b ) -dimensional bottom-right submatrix of Σ ^ = Σ ^ 1 Σ ^ 2 Σ ^ 1 ⊺ , which is associated with β ^ . Here, Σ ^ 1 is defined by

Σ ^ 1 = { G ^ ⊺ Ω ^ G ^ + diag ( 0 d × d , ρ ⋅ I N × N , 0 b × b ) } − 1 ( G ^ ⊺ Ω ^ ) , G ^ = ∂ Ψ ^ ( η , γ , β ) ∂ ( η , γ , β ) ⊺ η = η ^ , γ = γ ^ ρ , β = β ^ .

For Σ ^ 2 , we use a heteroskedasticity and autocorrelation consistent estimator [27,28] given the time series nature of the observed sample; see Supplementary Material S1.3 for details. Alternatively, one could implement the block bootstrap; see Supplementary Material S1.4 for details.

Finally, while Theorem 3.3 specifies the required rate for the regularization parameter ρ in relation to T , it is still necessary to select a specific value ρ for the given data at hand. In practice, we select ρ using cross-validation; for further details, see Supplementary Material S1.2. In addition, one may have access to exogenous covariates that may be leveraged to improve efficiency. In Supplementary Material S1.7, we provide details on how to incorporate measured covariates in the SPSC framework.

3.5 Conformal inference of the treatment effect

Key limitations of the methodology proposed in the previous Section include (i) a parsimonious model choice for τ t = τ ( t ; β ) may be mis-specified and (ii) it potentially requires both T 0 and T 1 be large to rely upon a law of large numbers and central limit theorem for valid asymptotic inference, so that our large sample analysis can be reliably used to quantify uncertainty associated with the estimated parameters. These limitations may be prohibitive in real-world applications with limited posttreatment follow-up data available. To address this specific challenge, previous works such as Cattaneo et al. [29] and Chernozhukov et al. [30] developed prediction intervals to assess statistical uncertainty, obviating the need to specify a model for the treatment effect or large T 1 . We focus on the conformal inference approach proposed by Chernozhukov et al. [30] due to its ready adaptation to the SPSC framework. The key idea of the approach is to construct pointwise prediction intervals for the random treatment effects ξ t * = Y t ( 1 ) − Y t ( 0 ) for t ∈ { T 0 + 1 , … , T } by inverting permutation tests about certain null hypotheses concerning ξ t * . One crucial requirement for the approach is the existence of an unbiased predictor for Y t ( 0 ) for t ∈ { 1 , … , T } . In the context of SPSC, the synthetic control W t ⊺ γ 0 * is an unbiased predictor for Y t ( 0 ) as established in Theorem 3.2, and, consequently, their approach readily applies. In what follows, we present the approach in detail.

Consider an asymptotic regime whereby T 0 goes to infinity while T 1 is fixed. Let s ∈ { T 0 + 1 , … , T } be a posttreatment period for which one aims to construct a prediction interval for the treatment effect; without loss of generality, we take s = T 0 + 1 . The null hypothesis of interest can be expressed as H 0 , T 0 + 1 : ξ T 0 + 1 * = ξ 0 , T 0 + 1 , where ξ 0 , T 0 + 1 represents a hypothesized treatment effect value. Under H 0 , T 0 + 1 , the treatment-free potential outcome at time T 0 + 1 can be identified as Y T 0 + 1 ( 0 ) = Y T 0 + 1 − ξ 0 , T 0 + 1 and, consequently, pretreatment outcomes Y 1 , … , Y T 0 may in fact be supplemented with Y T 0 + 1 − ξ 0 , T 0 + 1 to estimate the synthetic control weights. We may then redefine the pretreatment estimating function Ψ pre in equation (18) as follows:

Ψ pre ( O t ; η , γ , ξ 0 , T 0 + 1 ) = D t ( Y t − A t ξ 0 , T 0 + 1 − D t ⊺ η ) g ( t , Y t − A t ξ 0 , T 0 + 1 ; η ) ( Y t − A t ξ 0 , T 0 + 1 − W t ⊺ γ ) , t ∈ { 1 , … , T 0 + 1 } .

At the minimum-norm synthetic control weights γ 0 * , the redefined estimating function is mean-zero for t ∈ { 1 , … , T 0 + 1 } under H 0 , T 0 + 1 . Therefore, a GMM estimator γ ^ ( ξ 0 , T 0 + 1 ) can be obtained by solving the following minimization problem, which is similar to (19):

( η ^ ( ξ 0 , T 0 + 1 ) , γ ^ ρ ( ξ 0 , T 0 + 1 ) ) = arg min ( η , γ ) [ { Ψ ^ pre ( η , γ , ξ 0 , T 0 + 1 ) } ⊺ Ω ^ pre { Ψ ^ pre ( η , γ , ξ 0 , T 0 + 1 ) } + ρ ∥ γ ∥ 2 2 ] ,

where Ψ ^ pre ( η , γ , ξ 0 , T 0 + 1 ) = ( T 0 + 1 ) − 1 ∑ t = 1 T 0 + 1 Ψ pre ( O t ; η , γ , ξ 0 , T 0 + 1 ) and Ω ^ pre is the weight matrix used in (19). We may then compute residuals ν ^ t ( ξ 0 , T 0 + 1 ) = Y t − A t ξ 0 , T 0 + 1 − W t ⊺ γ ^ ρ ( ξ 0 , T 0 + 1 ) and use these residuals to obtain a p-value for testing the null hypothesis as follows:

p T 0 + 1 ( ξ 0 , T 0 + 1 ) = 1 T 0 + 1 ∑ t = 1 T 0 + 1 1 { ∣ ν ^ t ( ξ 0 , T 0 + 1 ) ∣ ≥ ∣ ν ^ T 0 + 1 ( ξ 0 , T 0 + 1 ) ∣ } .

In words, the p-value is the proportion of residuals of magnitudes no smaller than the posttreatment residual. Under H 0 , T 0 + 1 and regularity conditions including that the error ν t = Y t − ξ t * − W t ⊺ γ 0 * is stationary and weakly dependent, the p-value is approximately unbiased, i.e., Pr { p T 0 + 1 ( ξ 0 , T 0 + 1 ) ≤ α } = α + o ( 1 ) as T 0 → ∞ for a user-specified confidence level α ∈ ( 0 , 1 ) ; we refer the readers to Theorem 1 of Chernozhukov et al. [30] for technical details. Therefore, an approximate 100 ( 1 − α ) % prediction interval for ξ t * can be constructed by inverting the hypothesis test based on p T 0 + 1 ( ξ 0 , T 0 + 1 ) . This prediction interval is formally defined as C T 0 + 1 ( 1 − α ) = { ξ ∣ p T 0 + 1 ( ξ ) > α } and can be found via a grid-search.

4 Simulation

We conducted a simulation study to evaluate the finite sample performance of the proposed estimator under ary variety of conditions. On the basis of the IFEM in (1), we considered the following data generating mechanisms with pre- and posttreatment periods of length T 0 = T 1 ∈ { 50 , 100 , 250 , 500 } and donor pools of size N = 16 .

First, for each t ∈ { 1 , … , T } , we generated four-dimensional latent factors λ t = ( λ 1 t , … , λ 4 t ) ⊺ from N ( ν t , 0.25 ⋅ I 4 × 4 ) , with λ t independent across time periods. For the mean vector ν t = ( ν 1 t , … , ν 4 t ) ⊺ , we considered the following two specifications for j ∈ { 1 , … , 4 } :

( No trend ) : ν j t = 0 ; ( Linear trend ) : ν j t = t ⁄ T 0 ;

The latent factor loadings μ i for i ∈ { 1 , … , 16 } , i.e., latent factor loadings of untreated units, were specified as follows:

M = μ 1 … μ 16 = 2 1.75 1.5 1.25 1 0.75 0.5 0.25 0 1 × 8 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 1 × 8 0 0 0 0 0 0 0 0 1 1 × 8 0 0 0 0 0 0 0 0 0.5 ⋅ 1 1 × 8 ∈ R 4 × 16 .

The latent factor loading μ 0 , i.e., latent factor loading of the treated unit, was specified from either one of the following:

( Simplex ) : μ 0 = ( 1.125 , 0.5 , 0 , 0 ) ⊺ = 1 8 ∑ i = 1 8 μ i ; ( Non-simplex ) : μ 0 = ( 2 , 1.5 , 0 , 0 ) ⊺ .

Note that condition (2) is satisfied with γ † = M + μ 0 , although this vector is not the unique solution. Also, when μ 0 is chosen as (Simplex), γ † lies within a 16-dimensional simplex, thus satisfying the restriction of Abadie et al. [2]. In contrast, when μ 0 is chosen as (Non-implex), γ † does not belong to this simplex.

The errors e t = ( e 0 t , e 1 t , … , e 16 t ) ⊺ were generated independently across time periods from e t ∼ N ( 0 16 × 1 , 0.25 ⋅ diag ( Σ e , I 8 × 8 ) ) , where Σ e ∈ R 9 × 9 were chosen from one of the following three matrices with the corresponding ω i values in (14):

( Independent errors ) : Σ e = I 9 × 9 ; ω 0 = 1 , ω 1 = … = ω 16 = 0 ; ( Correlated errors ) : Σ e = 0.1 ⋅ I 9 × 9 + 0.9 ⋅ 1 9 × 9 ; ω 0 = 1 , ω 1 = … = ω 8 = 0.9 , ω 9 = … = ω 16 = 0 ; ( No Y error ) : Σ e = diag ( 0 , I 8 × 8 ) ; ω 0 = … = ω 16 = 0 .

Under (Correlated errors) and (No Y error), equation (14) admit solutions, thus satisfying Condition 3.1 and Assumption 3.2. In contrast, under (Independent errors), equation (14) does not have a solution, violating of Condition 3.1. Nonetheless, as we discussed in Section 3.2, it is still possible to find a synthetic control bridge function satisfying Assumption 3.2; see Supplementary Material S1.5 for details.

With these generated variables, Y t ( 0 ) and W i t at t ∈ { 1 , … , T } were generated as Y t ( 0 ) = μ 0 ⊺ λ t + e 0 t and W i t = μ i ⊺ λ t + e i t for i ∈ { 1 , … , N } , respectively. Note that the latter eight untreated units ( W 9 t , … , W 16 t ) ⊺ were independent of Y t ( 0 ) , which resulted in multiple synthetic control bridge functions satisfying Assumption 3.2. The potential outcomes under treatment at t ∈ { 1 , … , T } were generated as Y t ( 1 ) = Y t ( 0 ) + 3 A t + ε t , where ε t were generated independently across time periods from N ( 0 , 0.25 ) . Note that the ATT was τ t * = 3 for t ∈ { T 0 + 1 , … , T } .

By using the simulated data, we estimated the aggregated ATT over the posttreatment periods, i.e., T 1 − 1 ∑ t = T 0 + 1 T τ t * , based on the following six estimators. First, we obtained two ATT estimators based on the proposed SPSC approaches both time-invariant pretreatment estimating function Φ pre in (16) and time-varying pretreatment estimating function Ψ pre in (18) where the function ϕ was chosen as ϕ ( y ) = y . We specified the vector D t in Ψ pre as D t = ℬ 6 ( t ) , the six-dimensional cubic B-spline function, to adjust a potential time trend. These estimators are referred to as SPSC-NoDT and SPSC-DT, respectively. For comparison, we also considered two OLS-based ATT estimators based on (5). In the first OLS-based ATT estimator, we place no regularization on the weight; in the second OLS-based ATT estimator, we followed Abadie et al. [2] to restrict the weight to be nonnegative and its values must add up to one. These estimators are referred to as OLS-NoReg and OLS-Standard, respectively. Finally, we implemented two recently developed synthetic control methods by Ben-Michael et al. [5] and Cattaneo et al. [29], which are referred to as augmented synthetic control (ASC) and synthetic control prediction interval (SCPI), respectively. In our analysis, we implemented the OLS-Standard, ASC, and SCPI estimators using synth [31], augsynth [32], and scpi [33] R-packages, respectively. Unfortunately, these three packages do not appear to provide readily available standard errors, so the standard errors and empirical coverage rates of these methods are not reported. We repeated the simulation 500 times.

To simplify the discussion, we present only the results under the (Non-simplex) case for μ 0 . The results for the (Simplex) case are included in Supplementary Material S1.8. Figure 2 summarizes the empirical distribution of the estimators graphically. First, when λ t does not have a trend, all estimators exhibit negligible bias for the ATT regardless of error specifications. Second, when λ t has a linear trend case, we find that the four estimators from OLS, ASC, and SCPI approaches are biased for the ATT. Although the SPSC-NoDT estimator outperforms these four estimators, it still exhibits nonnegligible bias when the errors are independent. In contrast, the SPSC-DT estimator little bias for all error specifications. Note that 95% Monte Carlo confidence interval for SPSC estimators shrinks as the number of time periods increases, which is consistent with the results established in Section 3.

$Figure 2 Empirical Distributions of the Estimators. The top and bottom plots show results when λ t {{\boldsymbol{\lambda }}}_{t} has no trend and a linear trend, respectively. The left, middle, and right panels show results under (Independent errors), (Correlated errors), and (No Y error) for e t {{\boldsymbol{e}}}_{t} , respectively. The vertical segments represent 95% Monte Carlo confidence interval for each estimator. The dots represent the empirical mean of 500 estimates. The colors (light gray, gray, and black) and line types (solid and dashed) encode a corresponding estimator, and the shape of the dots encode the length of the pretreatment period, respectively. The y y -axis represents the magnitude of bias.$

Figure 2

Empirical Distributions of the Estimators. The top and bottom plots show results when λ t has no trend and a linear trend, respectively. The left, middle, and right panels show results under (Independent errors), (Correlated errors), and (No Y error) for e t , respectively. The vertical segments represent 95% Monte Carlo confidence interval for each estimator. The dots represent the empirical mean of 500 estimates. The colors (light gray, gray, and black) and line types (solid and dashed) encode a corresponding estimator, and the shape of the dots encode the length of the pretreatment period, respectively. The y -axis represents the magnitude of bias.

Table 1 provides more detailed summary statistics when errors e t were generated from the (Independent errors) case, a common assumption that the standard IFEM make. The results for the other two error specifications are reported in Supplementary Material S1.8. We remark that the OLS-Standard, ASC, and SCPI approaches do not provide a standard error or 95% confidence interval for the ATT. First, when λ t has no trend, all estimators exhibit negligible bias and achieve the nominal coverage rate, provided that confidence intervals are available. Second, when λ t has a linear trend, the performance of the estimators differs in terms of both mean squared error and coverage rate. We find that the SPSC-DT estimator attains the smallest mean squared error compared to the other estimators, including the SPSC-NoDT estimator. Regarding the coverage rate, confidence intervals based on the OLS-NoReg and SPSC-NoDT estimators fail to attain the nominal coverage rate, especially when T 0 and T 1 are large due to nondiminishing bias. In contrast, confidence intervals based on the SPSC-DT estimator attain the nominal coverage rate. This demonstrates that accounting for time-varying components in the pretreatment estimating estimation can significantly improve the performance of the SPSC estimators and is, in fact, necessary to conduct valid inference.

Table 1

Summary statistics of estimation results under independent errors

λ t	Statistics	Estimators and T 0
		OLS-NoReg		OLS-Standard		ASC		SCPI		SPSC-NoDT		SPSC-DT
		100	500	100	500	100	500	100	500	100	500	100	500
No trend	Bias ( × 10 )	0.02	− 0.01	0.04	0.00	0.03	− 0.01	0.03	− 0.01	0.03	− 0.01	0.03	− 0.01
	ASE ( × 10 )	0.89	0.37	—	—	—	—	—	—	0.84	0.37	0.84	0.37
	BSE ( × 10 )	0.97	0.38	—	—	—	—	—	—	0.84	0.37	0.84	0.37
	ESE ( × 10 )	0.92	0.37	1.25	0.51	0.90	0.39	0.92	0.39	0.89	0.37	0.89	0.37
	MSE ( × 100 )	0.85	0.14	1.56	0.26	0.82	0.15	0.84	0.15	0.79	0.14	0.79	0.14
	Coverage (ASE)	0.95	0.96	—	—	—	—	—	—	0.93	0.96	0.93	0.96
	Coverage (BSE)	0.96	0.96	—	—	—	—	—	—	0.93	0.96	0.93	0.96
Linear trend	Bias ( × 10 )	1.30	1.37	10.70	10.51	8.26	5.70	11.93	11.85	− 1.42	− 1.61	0.07	0.08
	ASE ( × 10 )	1.76	0.79	—	—	—	—	—	—	1.85	0.84	1.79	0.81
	BSE ( × 10 )	2.02	0.82	—	—	—	—	—	—	2.24	0.99	1.94	0.86
	ESE ( × 10 )	1.82	0.77	1.16	0.41	1.70	0.95	0.96	0.43	2.08	0.97	1.94	0.85
	MSE ( × 100 )	5.01	2.45	115.72	110.54	71.16	33.37	143.18	140.58	6.36	3.54	3.75	0.72
	Coverage (ASE)	0.87	0.58	—	—	—	—	—	—	0.84	0.49	0.93	0.94
	Coverage (BSE)	0.93	0.61	—	—	—	—	—	—	0.91	0.60	0.94	0.96

Bias row gives the empirical bias of 500 estimates. ASE row gives the asymptotic standard error obtained from the sandwich estimator of the GMM. BSE row shows the bootstrap standard error obtained from the approach in Supplementary Material S1.4. ESE row gives the standard deviation of 500 estimates. MSE row gives the mean squared error of 500 estimates. Coverage (ASE) and Coverage (BSE) rows give the empirical coverage rate of 95% confidence intervals based on the ASE and BSE, respectively. Bias, standard errors, and mean squared error are scaled by factors of 10, 10, and 100, respectively.

Next, we evaluated the finite sample performance of the conformal inference approach in Section 3.5. As competing methods, we considered the ASC, SCPI, and two SPSC estimators. For each simulated dataset, we obtained pointwise 95% pointwise prediction intervals for the treatment effect ξ t * = τ t * + ε t at 10 posttreatment times t ∈ { T 0 + 0.1 T 1 , … T 0 + 0.9 T 1 , T 0 + T 1 } , using the proposed conformal inference approach for the SPSC estimators along with the ASC and SCPI approaches. We then evaluated the empirical coverage rates of these pointwise prediction intervals based on 500 simulation repetitions, i.e., the proportion of Monte Carlo samples where ξ t * is contained in 95% pointwise prediction intervals.

Table 2 presents the empirical coverage rates for each simulated scenario. Surprisingly, the ASC and SPCI approaches fail to attain the nominal coverage rate; we believe this failure originates from the simulation setting where μ 0 lies outside the simplex, i.e., (Non-simplex). These methods perform particularly poorly when λ t follows a linear trend and the number of time periods is large (i.e., T 0 = 500 ). In contrast, regardless whether λ t has a trend, both SPSC estimators attains the desired nominal coverage rate, aligning closely with theoretical expectations.

Table 2

Empirical coverage rates of 95% pointwise prediction intervals

λ t	e t	Estimators and T 0
		ASC		SCPI		SPSC-NoDT		SPSC-DT
		100	500	100	500	100	500	100	500
No trend	Independent errors	0.933	0.925	0.935	0.912	0.959	0.948	0.961	0.948
	Correlated errors	0.904	0.906	0.925	0.913	0.962	0.952	0.963	0.953
	No Y error	0.922	0.910	0.938	0.925	0.964	0.951	0.964	0.953
Linear trend	Independent errors	0.795	0.846	0.938	0.889	0.959	0.944	0.962	0.949
	Correlated errors	0.844	0.883	0.907	0.843	0.957	0.944	0.967	0.948
	No Y error	0.728	0.808	0.920	0.852	0.957	0.953	0.957	0.946

The numbers in SPSC-NoDT and SPSC-DT columns give the empirical coverage rates of 95% pointwise prediction intervals obtained from the conformal inference approach in Section 3.5. The numbers in ASC and SCPI columns give the empirical coverage rates of 95% pointwise prediction intervals obtained from the approaches proposed by Ben-Michael et al. [5] and Cattaneo et al. [29], respectively.

In Supplementary Material S1.9, we assess the finite sample performance of the proposed conformal inference approach based on the simulation scenario given in the study by Cattaneo et al. [29], which may not be compatible with the key identifying condition, Assumption 3.2, of SPSC. As expected, the approach of Cattaneo et al. [29] performs well in this setting. Although our method without time trend adjustment (i.e., SPSC-NoDT) sometimes fails to achieve the nominal coverage rate, particularly when outcomes are nonstationary, our method with time trend adjustment (i.e., SPSC-DT) consistently attains the nominal coverage rate, provided that the basis functions for time periods are appropriately chosen. This highlights the robustness of the proposed SPSC approach and its broad applicability in synthetic control settings.

5 Application

We applied the proposed method to analyze a real-world application. In particular, we revisited the dataset analyzed in Fohlin and Lu [16], which consists of time series data of length 384 for 59 trust companies, recorded between January 5, 1906, and December 30, 1908, with a triweekly frequency. Notably, this time period includes the Panic of 1907 [17], a financial panic that lasted for three weeks in the United States starting in mid-October, 1907. As a result of the panic, there was a significant drop in the stock market during this period. From this context, we focused on the effect of the financial panic in October 1907 on the log stock price of trust companies using T 0 = 217 pretreatment time periods and T 1 = 167 posttreatment time periods, respectively.

The treated unit and donors were defined as follows. According to Fohlin and Lu [16], Knickerbocker, Trust Company of America, and Lincoln were the three trust companies that were most severely affected during the panic. However, Lincoln’s stock price showed a strong downward trend over the pretreatment period. Therefore, we defined the average of the log stock prices of the first two trust companies as Y t , the outcome of the treated units at time t ∈ { 1 , … , 384 } . As for potential donors, Fohlin and Lu [16] identified N = 49 trust companies that had weak financial connections with the aforementioned three severely affected trust companies. Accordingly, the log stock prices of these 49 trust companies were defined as W t , the outcome of the donors. Following the simulation study, we specified the time-invariant and time-varying pretreatment estimating functions, Φ pre and Ψ pre , with ϕ ( y ) = y and D t = ℬ 6 ( t ) , the six-dimensional cubic B-spline function, to account for a potential time trend.

We first report the ATT estimates under a constant treatment effect model τ ( t ; β ) = β . Similar to Section 4, we compare the same six estimators: the unconstrained OLS synthetic control estimator (OLS-NoReg), the standard synthetic control approach proposed by Abadie et al. [2] (OLS-Standard), two recent approaches by Ben-Michael et al. [5] (ASC) and Cattaneo et al. [29] (SCPI), and SPSC estimators without and with time-varying terms (SPSC-NoDT, SPSC-DT). The results are summarized in Table 3. Interestingly, all six estimators yield similar point estimates of the treatment effect, ranging from − 1.021 to − 0.813 . According to the 95% confidence intervals, three estimates uniformly reject the null hypothesis of no treatment effect across time points, suggesting that the financial panic led to a significant decrease in the average log stock price of Knickerbocker and Trust Company of America. We remark again that the OLS-Standard, ASC, and SCPI approaches do not provide a standard error or 95% confidence interval for the ATT. In terms of the length of the confidence interval, SPSC with the time-varying components (i.e., SPSC-DT) yields the narrowest confidence interval, followed by SPSC with no time-varying component (i.e., SPSC-NoDT), and the approach based on OLS.

Table 3

Summary statistics of the estimation of the average treatment effect on the treated

Estimator	OLS-NoReg	OLS-Standard	ASC	SCPI	SPSC-NoDT	SPSC-DT
Estimate	− 1.021	− 0.873	− 0.912	− 0.876	− 0.813	− 0.816
ASE	0.139	—	—	—	0.084	0.066
95% CI	( − 1.295 , − 0.748 )	—	—	—	( − 0.978 , − 0.648 )	( − 0.945 , − 0.688 )

We also constructed the pointwise prediction intervals based on the SPSC approach with time-varying components using the conformal inference approach in Section 3.5. For comparison, we also implemented the ASC and SCPI approaches. Figure 3 presents the visual summary of the result. For the posttreatment period t ∈ { 218 , … , 384 } , we find that Y ^ t ( 0 ) , the predictive value of the treatment-free potential outcome, have similar shapes for all methods. However, 95% pointwise prediction intervals behave differently. Specifically, we focus on the average width of the prediction intervals over the posttreatment periods. The prediction intervals from the ASC and SCPI approaches have average widths of 0.091 and 0.114, respectively; in contrast, our method with time-varying components yields prediction intervals with average widths of 0.068, over 25% narrower than those from the competing methods; see Supplementary Material S1.10 for the distribution of the prediction interval widths across time. The comparison reveals that our method appears to produce tighter predictions of treatment effect trends. Combining results in the simulation study and the data application, we conclude that our approach appears to perform quite competitively when compared to some leading alternative methods in the literature.

$Figure 3 Graphical Summaries of the 95% prediction intervals over the posttreatment periods. These plots, from left to right, present the results using the approaches proposed by Ben-Michael et al. [5], Cattaneo et al. [29], and the conformal inference approach presented in Section 3.5 with the time-varying estimating function Ψ pre {\Psi }_{\text{pre}} , respectively. The numbers show the average length of the 95% prediction intervals over the posttreatment periods.$

Figure 3

Graphical Summaries of the 95% prediction intervals over the posttreatment periods. These plots, from left to right, present the results using the approaches proposed by Ben-Michael et al. [5], Cattaneo et al. [29], and the conformal inference approach presented in Section 3.5 with the time-varying estimating function Ψ pre , respectively. The numbers show the average length of the 95% prediction intervals over the posttreatment periods.

In addition, for the sake of credibility, we conducted the following additional analysis for the application; the details can be found in Supplementary Material S1.10. First, we studied the trend of the residuals, the difference between the observed outcome and synthetic control, over the pretreatment time periods. We observed that the OLS-NoReg, SCPI, and SPSC-DT estimators produced residuals without a deterministic trend over time, while the other three estimators (OLS-Standard, ASC, SPSC-NoDT) showed the opposite behavior. Notably, the SPSC-DT estimator appears to satisfy the zero mean condition of Assumption 3.2, whereas the SPSC-NoDT estimator seems to violate this condition due to a nonzero deterministic trend over time. This again highlights the importance of accommodating time-varying components in the SPSC estimation procedure.

Next, we performed the following falsification study. We restricted the entire analysis to the pretreatment period in which the causal effect is expected to be null. We artificially defined a financial panic time in late July 1907, which is roughly three months before the actual financial panic. This resulted in the lengths of the pre- and posttreatment periods equal to T 0 ′ = 181 and T 1 ′ = 36 , respectively. The proposed SPSC-NoDT and SPSC-DT estimators resulted in the placebo ATT estimates of − 0.005 and 0.005 with 95% confidence intervals of ( − 0.025 , 0.016) and ( − 0.004 , 0.013), respectively. The placebo ATT estimate obtained from the unconstrained OLS estimator was − 0.031 with a 95% confidence interval of ( − 0.062 , 0.001). All 95% prediction intervals include the null, consistent with the expectation of no treatment effect in the placebo period. Finally, the constrained OLS estimator (i.e., OLS-Standard), ASC estimator, and SCPI estimator produced placebo ATT estimates of − 0.012 , − 0.032 , and − 0.015 , respectively, which are also close to zero; however, corresponding statistical inference was not available for these estimators. Therefore, these results provide no evidence against validity of the estimators. In Supplementary Material S1.10, we provide a trajectory of the synthetic controls along with 95% prediction intervals under the placebo treatment. Our findings indicate that the 95% prediction intervals from the SPSC-DT estimator support the null causal effect. However, the ASC and SCPI estimators occasionally fail to do so during certain time periods. Therefore, we conclude that the SPSC-DT estimator provides a more reliable framework for analyzing the impact of financial panic on the stock prices of the two trust companies.

6 Concluding remarks

In this article, we propose a novel SPSC approach in which the synthetic control is defined as a linear combination of donors’ outcomes whose conditional expectation matches the treatment-free potential outcome in both pre- and posttreatment periods. The model is analogous yet more general than measurement error models widely studied in standard measurement error literature. Under the framework, we establish the identification of a synthetic control and provide an estimation strategy for the ATT. Furthermore, we introduce a method for inferring the treatment effect through pointwise prediction intervals, which remains valid even in the case of a short posttreatment period. We validate our methods through simulation studies and provide an application analyzing a real-world financial dataset related to the 1907 Panic.

We reiterate that the SPSC framework differs from existing synthetic control methods in its identifying assumptions and interpretation. It views the synthetic control as an error-prone outcome measurement, without the need for specifying a generative model for the outcome, whereas existing approaches treat it as the projection of the outcome onto the donor’s outcome space or the outcome itself. Despite these differences, both frameworks construct synthetic controls by optimally weighting donor units (according to their identifying assumptions), which are then used for treatment effect estimation. In addition, like other synthetic control methods, the SPSC framework allows for time-varying confounders, as demonstrated in the generative models in Section 3.2.

While, as mentioned in Section 3.1, the SPSC framework may be viewed as a nonstandard form of instrumental variable approach, it is important to highlight key distinctions between the proposed SPSC approach and well-known instrumental variable approaches in dynamic panel data, such as in Anderson and Hsiao [34] and Arellano and Bond [35]. In dynamic panel data models, endogeneity arises across different time periods, with the typical assumption that there is no within-time period endogeneity. As a result, these models use lagged variables as instruments to address cross-time endogeneity. In contrast, in the SPSC framework, endogeneity occurs within each time period, without specific assumptions about cross-time endogeneity. Consequently, the instrumental variable approach in this context operates within a single time period. We remark that, like the SPSC framework, many synthetic control models are agnostic about cross-time dependence structure; for example, the cross-time dependent structure of λ t in the IFEM (1) is agnostic. A notable exception to this agnostic perspective is the “Instrumental variable-like SC estimator” proposed by Ferman and Pinto [36], an earlier version of Ferman and Pinto [6], which was developed in the presence of serial correlation in λ t . Similar to estimators used in dynamic panel data models, it employs lagged variables as instruments.

As briefly mentioned in Section 1, the proposed SPSC framework has a connection to the single proxy control framework [14,15] developed for i.i.d. data. In particular, Park et al. [15] proposed an approach that relies on the so-called outcome bridge function, which is a (potentially nonlinear) function of outcome proxies. An important property of the outcome bridge function is that it is conditionally unbiased for the treatment-free potential outcome. Therefore, the proposed SPSC approach can be viewed as an adaptation of the outcome bridge function-based single proxy control approach to the synthetic control setting, where the outcome bridge function is known a priori to be a linear function of donors’ outcomes. In Supplementary Material S2, we present a general SPSC framework, which is designed to accommodate nonparametric and nonlinear synthetic controls. Therefore, this framework obviates the overreliance on a linear specification of synthetic controls in the literature and establishes a more direct connection with the outcome bridge function-based single proxy approach presented in Park et al. [15]. Notably, the general SPSC framework addresses underdeveloped areas of the synthetic control literature by allowing for various types of outcomes, including continuous, binary, count, or a combination of these.

In addition to the outcome bridge function-based approach, Park et al. [15] introduced two other single proxy control approaches for i.i.d. sampling. One approach relies on propensity score weighting, eliminating the need for specifying an outcome bridge function. The second approach uses both the propensity score and the outcome bridge function and, more importantly, exhibits a doubly-robust property in that the treatment effect in view is identified if either propensity score or outcome bridge function, but not necessarily both, is correctly specified. Consequently, a promising direction for future research would be to develop new SPSC approaches by extending these single proxy methods to the synthetic control setting. Such new SPSC approaches can be viewed as complementing the doubly-robust proximal synthetic control approach [10]. However, such extensions pose significant challenges due to (i) a single treated unit with nonrandom treatment assignmen; (ii) multiple heterogeneous untreated donor units; and (iii) serial correlation and heteroskedasticity due to the time series nature of the data. In particular, nonrandom treatment assignment undermines the conventional notion of the propensity score, rendering it undefined. Approaches for addressing these challenges and developing corresponding statistical methods will be considered elsewhere.

Acknowledgment

The authors are grateful for the comments from the editorial board and three anonymous reviewers, which significantly improved the manuscript.

Funding information: ETT was supported by NIH grant R01AG065276.
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and consented to its submission to the journal, reviewed all the results, and approved the final version of the manuscript.
Conflict of interest: The authors state no conflict of interest.
Data availability statement: The proposed method is implemented in an R package available at https://github.com/qkrcks0218/SPSC. The dataset analyzed in Section 5 is included in the replication package of Fohlin and Lu [16] accessible at https://www.aeaweb.org/articles?id=10.1257/pandp.20211097.

References

[1] Abadie A, Gardeazabal J. The economic costs of conflict: A case study of the basque country. Am Econ Rev. 2003;93(1):113–32. 10.1257/000282803321455188Search in Google Scholar

[2] Abadie A, Diamond A, Hainmueller J. Synthetic control methods for comparative case studies: Estimating the effect of California’s Tobacco control program. J Am Stat Assoc. 2010;105(490):493–505. 10.1198/jasa.2009.ap08746Search in Google Scholar

[3] Xu Y. Generalized synthetic control method: causal inference with interactive fixed effects models. Polit Anal. 2017;25(1):57–76. 10.1017/pan.2016.2Search in Google Scholar

[4] Amjad M, Shah D, Shen D. Robust synthetic control. J Machine Learn Res. 2018;19(22):1–51. Search in Google Scholar

[5] Ben-Michael E, Feller A, Rothstein J. The augmented synthetic control method. J Am Stat Assoc. 2021;116(536):1789–803. 10.1080/01621459.2021.1929245Search in Google Scholar

[6] Ferman B, Pinto C. Synthetic controls with imperfect pretreatment fit. Quant Econ. 2021;12(4):1197–221. 10.3982/QE1596Search in Google Scholar

[7] Ferman B. On the properties of the synthetic control estimator with many periods and many controls. J Am Stat Assoc. 2021;116(536):1764–72. 10.1080/01621459.2021.1965613Search in Google Scholar

[8] Shi X, Li K, Miao W, Hu M, Tchetgen Tchetgen E. Theory for identification and inference with synthetic controls: a proximal causal inference framework. 2023. Preprint arXiv:210813935. Search in Google Scholar

[9] Bai J. Panel data models with interactive fixed effects. Econometrica. 2009;77(4):1229–79. 10.3982/ECTA6135Search in Google Scholar

[10] Qiu H, Shi X, Miao W, Dobriban E, Tchetgen Tchetgen E. Doubly robust proximal synthetic controls. Biometrics. 2024;80(2):ujae055. 10.1093/biomtc/ujae055Search in Google Scholar PubMed PubMed Central

[11] Shi C, Sridhar D, Misra V, Blei D. On the assumptions of synthetic control methods. In: Camps-Valls G, Ruiz FJR, Valera I, editors. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics. vol. 151 of Proceedings of Machine Learning Research. PMLR; 2022. p. 7163–75. Search in Google Scholar

[12] Miao W, Geng Z, Tchetgen Tchetgen EJ. Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika. 2018;105(4):987–93. 10.1093/biomet/asy038Search in Google Scholar PubMed PubMed Central

[13] Tchetgen Tchetgen EJ, Ying A, Cui Y, Shi X, Miao W. An introduction to proximal causal inference. Stat Sci. 2024;39(3):375–90. 10.1214/23-STS911Search in Google Scholar

[14] Tchetgen Tchetgen E. The control outcome calibration approach for causal inference with unobserved confounding. Am J Epidemiol. 2013;179(5):633–40. 10.1093/aje/kwt303Search in Google Scholar PubMed PubMed Central

[15] Park C, Richardson DB, Tchetgen Tchetgen EJ. Single proxy control. Biometrics. 2024;80(2):ujae027. 10.1093/biomtc/ujae027Search in Google Scholar PubMed PubMed Central

[16] Fohlin C, Lu Z. How contagious was the panic of 1907? New evidence from trust company stocks. AEA Papers and Proceedings. 2021. p. 111. 10.1257/pandp.20211097Search in Google Scholar

[17] Moen J, Tallman EW. The bank panic of 1907: The role of trust companies. J Econ History. 1992;52(3):611–30. 10.1017/S0022050700011414Search in Google Scholar

[18] Doudchenko N, Imbens GW. Balancing, regression, difference-in-differences and synthetic control methods: A synthesis. Technical Report. 2016. National Bureau of Economic Research. 10.3386/w22791Search in Google Scholar

[19] Robbins MW, Saunders J, Kilmer B. A framework for synthetic control methods with high-dimensional, micro-level data: evaluating a neighborhood-specific crime intervention. J Am Stat Assoc. 2017;112(517):109–26. 10.1080/01621459.2016.1213634Search in Google Scholar

[20] Hansen LP. Large sample properties of generalized method of moments estimators. Econometrica. 1982;50(4):1029–54. 10.2307/1912775Search in Google Scholar

[21] Cui Y, Pu H, Shi X, Miao W, Tchetgen Tchetgen E. Semiparametric proximal causal inference. J Am Stat Assoc. 2024;119(546):1348–59. 10.1080/01621459.2023.2191817Search in Google Scholar

[22] Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: a modern perspective. 2nd ed. Chapman and Hall/CRC; 2006. 10.1201/9781420010138Search in Google Scholar

[23] Freedman LS, Midthune D, Carroll RJ, Kipnis V. A comparison of regression calibration, moment reconstruction and imputation for adjusting for covariate measurement error in regression. Stat Med. 2008;27(25):5195–216. 10.1002/sim.3361Search in Google Scholar PubMed PubMed Central

[24] Pearl J. Causal diagrams for empirical research. Biometrika. 1995;82(4):669–88. 10.1093/biomet/82.4.669Search in Google Scholar

[25] Donald SG, Imbens GW, Newey WK. Choosing instrumental variables in conditional moment restriction models. J Econ. 2009;152(1):28–36. 10.1016/j.jeconom.2008.10.013Search in Google Scholar

[26] Chen X. Chapter 76: Large sample sieve estimation of semi-nonparametric models. vol. 6 of Handbook of Econometrics; 2007. p. 5549–632. 10.1016/S1573-4412(07)06076-XSearch in Google Scholar

[27] Newey WK, West KD. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica. 1987;55(3):703–8. 10.2307/1913610Search in Google Scholar

[28] Andrews DWK. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica. 1991;59(3):817–58. 10.2307/2938229Search in Google Scholar

[29] Cattaneo MD, Feng Y, Titiunik R. Prediction intervals for synthetic control methods. J Am Stat Assoc. 2021;116(536):1865–80. 10.1080/01621459.2021.1979561Search in Google Scholar PubMed PubMed Central

[30] Chernozhukov V, Wüthrich K, Zhu Y. An exact and robust conformal inference method for counterfactual and synthetic controls. J Am Stat Assoc. 2021;116(536):1849–64. 10.1080/01621459.2021.1920957Search in Google Scholar

[31] Abadie A, Diamond A, Hainmueller J. Synth: An R package for synthetic control methods in comparative case studies. J Stat Softw. 2011;42(13):1–17. https://www.jstatsoft.org/v42/i13/. 10.18637/jss.v042.i13Search in Google Scholar

[32] Ben-Michael E. Augsynth: The Augmented Synthetic Control Method; 2023. R package version 0.2.0. Search in Google Scholar

[33] Cattaneo M, Feng Y, Palomba F, Titiunik R. scpi: Prediction intervals for synthetic control methods with multiple treated units and staggered adoption; 2023. R package version 2.2.2. https://CRAN.R-project.org/package=scpi. 10.32614/CRAN.package.scpiSearch in Google Scholar

[34] Anderson TW, Hsiao C. Estimation of dynamic models with error components. J Am Stat Assoc. 1981;76(375):598–606. 10.1080/01621459.1981.10477691Search in Google Scholar

[35] Arellano M, Bond S. Some tests of specification for panel data: Monte carlo evidence and an application to employment equations. Rev Econ Stud. 1991;58(2):277–97. 10.2307/2297968Search in Google Scholar

[36] Ferman B, Pinto C. Synthetic controls with imperfect pretreatment fit. 2019. arXiv: 191108521.Search in Google Scholar

Received: 2023-12-02

Revised: 2025-03-05

Accepted: 2025-04-04

Published Online: 2025-07-05

This work is licensed under the Creative Commons Attribution 4.0 International License.

Supplementary material

Articles in the same Issue

https://doi.org/10.1515/jci-2023-0079

Keywords for this article

average treatment effect on the treated; conformal inference; generalized method of moments; prediction interval; synthetic control

Creative Commons

BY 4.0