Moment-Based Estimation of Linear Panel Data Models with Factor-Augmented Errors

Nicholas L. Brown

doi:10.1515/jem-2023-0050

Article Open Access

Moment-Based Estimation of Linear Panel Data Models with Factor-Augmented Errors

Nicholas L. Brown

Published/Copyright: October 16, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Econometric Methods Volume 13 Issue 2

Abstract

I compare two popular methods of estimation for linear panel data models with unobserved factors: the first eliminates the factors with a parameterized quasi-long-differencing (QLD) transformation. The other, referred to as common correlated effects (CCE), uses cross-sectional averages of the data to proxy for the factor space. I show that the CCE assumptions imply unused moment conditions that can be exploited by the QLD transformation. I also derive new linear estimators that weaken identifying assumptions and have desirable theoretical properties. Unlike CCE, these estimators do not require the number of covariates to be less than the number of time periods. I provide the first proof of a fixed-T consistent mean group estimator for heterogeneous linear models with interactive fixed effects. I investigate the effects of per-student expenditure on standardized test performance using data from the state of Michigan.

Keywords: factor models; common correlated effects; quasi-long-differencing; fixed-T; correlated random coefficients

JEL Classification: C23; C33

1 Introduction

The prevalence of panel data in modern economics has led theorists and practitioners to pay more attention to unobserved heterogeneity. A popular representation of unobserved effects is the linear factor structure ∑ j = 1 p f t j γ j i where f _tj is a time-varying macro effect or “common factor” and γ _ji is an individually heterogeneous response or “factor loading”. Except under highly-specific circumstances, the usual within transformation is insufficient in controlling for these unobserved effects. Many factor estimators require the number of time periods T to grow large with the number of cross-sectional units N. As the vast majority of microeconometric data sets have only a few time periods, the recent literature assumes T is fixed while N goes to infinity.

One of the most popular approaches is the common correlated effects (CCE) estimator of Pesaran (2006). He assumes an additional reduced form model where the covariates exhibit a pure factor model. The pooled CCE estimator comes from the OLS regression that estimates unit-specific slopes on the cross-sectional averages of the dependent and independent variables. CCE is similar to a fixed effects treatment that seeks to eliminate the factors and remove a source of both endogeneity and cross-sectional dependence. Consistency and asymptotic normality was originally proved for sequences of N and T going to infinity but recent work extends the CCE framework to a fixed-T setting. De Vos and Everaert (2021) derive a fixed-T consistency correction for the dynamic CCE estimator but requires T → ∞ for asymptotic normality. Westerlund, Petrova, and Norkute (2019) provide the first asymptotic normality derivation of pooled CCE when T is fixed and N → ∞.

Despite its theoretical rigor and practicality, the CCE estimator does not use all fundamental moment conditions of the model. This fact implies that the pure factor structure in the covariates cannot improve efficiency for the CCE estimator because it is irrelevant from a population information perspective, despite being necessary for consistency. As other fixed- T N -consistent estimators exist that do not require this assumption, it is worth investigating how else the CCE assumptions can be used in estimation. I use the quasi-long-differencing (QLD) transformation of Ahn, Lee, and Schmidt (2013) to explore the implications of this model and show that the additional ignored CCE pure factor moments are relevant for the estimation of the parameters of interest in the main equation.^[1]

Ahn, Lee, and Schmidt (2013) choose a particular normalization of the unobserved factors that induces a smaller set of estimable parameters. They include these parameters in their QLD transformation that can then asymptotically eliminate the space spanned by the factors. While they did not originally assume a pure factor structure in the covariates, I use their transformation to study the CCE model. I show that the reduced form pure factor model provides information for estimating the parameters of interest, which is ignored by the pooled CCE estimator, but can be integrated into more efficient joint GMM estimation using the Ahn, Lee, and Schmidt (2013) estimator. Further, the CCE estimator generally uses more factor proxies than necessary that can lead to inefficiency. I also demonstrate how the literature’s current understanding of the factor loadings causes problems for inference under model misspecification, which I correct for. Finally, the CCE estimator cannot accommodate more covariates than time periods, a highly restrictive condition in microeconometric settings. For example, in an intervention analysis with only pre-treatment, treatment, and post-treatment observations, classical CCE would require the treatment indicator to be the only regressor. My estimators will not require this restriction.

Another potential source of heterogeneity in linear models comes from the slope coefficients on the observed variables of interest. Pesaran (2006) proves fixed-T consistency of the mean group CCE estimator under random slopes but assumes they are independent of everything else in the model. Asymptotic normality requires T → ∞ and pooled CCE is studied under constant slopes. Recent work by Westerlund and Kaddoura (2022) and Brown, Schmidt, and Wooldridge (2023) extend this analysis to a fixed-T setting, but require strong assumptions on the individual slopes for consistency. I prove fixed-T consistency and asymptotic normality of my new pooled and mean group QLD estimators. I show that the first-stage estimation of the QLD parameters does not affect consistency, which mirrors the pooled OLS result of Wooldridge (2005), who assumes known factors. To the best of my knowledge, this paper is the first to consider arbitrary random slopes in the context of fixed-T panels with factor-driven endogeneity.

The rest of the paper is structured as follows: Section 2 discusses the model of interest. Section 3 provides the estimators under consideration and derives their theoretical properties. Section 4 provides simulation evidence for the finite-sample properties of the expanded GMM estimators, along with the new QLD estimators. Section 5 compares the various estimators by looking at the effect of education expenditure on standardized test performance using a school district-level data set from the state of Michigan. Section 6 contains concluding remarks.

2 Model

This section lays out the models considered in Westerlund, Petrova, and Norkute (2019) and Ahn, Lee, and Schmidt (2013), the fixed-T CCE and QLD approaches respectively. Throughout the paper, the equation of interest is

(1) y i = X i β i + F 0 γ i + u i

(2) β i ∼ ( β 0 , Σ b )

where y _i is a T × 1 vector of outcomes, X _i is T × K matrix of covariates, β _i is a K × 1 vector of random slope coefficients, F ₀ is a T × p ₀ matrix of factors common to all units in the population, γ _i is a p ₀ × 1 vector of factor loadings, and u _i is a T × 1 vector of idiosyncratic shocks.^[2] A ‘0’ subscript denotes the true or realized value of an unobserved parameter. We are interested in inference on the K × 1 vector β ₀ = E( β _i). The factor structure F ₀ γ _i is treated as a collection of nuisance parameters. p ₀ is then unobserved because F ₀ and γ _i are unobserved. However, we can consistently test for p ₀, so it will be treated as known. Simulation evidence from this paper and others also suggests that overestimating p ₀ does not cause inconsistency in QLD estimation. p denotes the number of factors specified by the econometrician.

The random slope condition as stated in equation (2) is fairly general. It allows for the slopes to be written as β _i = β ₀ + b _i where E( b _i) = 0 . Endogeneity occurs when b _i is correlated with other aspects of the model. Most fixed-T treatments of random slope models either exclude or simplify the factor structure as in a fixed effects analysis. Examples of fixed effects treatments include Juhl and Lugovskyy (2014), Campello, Galvao, and Juhl (2019), and Breitung and Salish (2021). Though Chudik and Pesaran (2015); Neal (2015); Norkutė et al. (2021); Pesaran (2006) allow for random slopes and arbitrary factors, they require T to grow to infinity and make strong exogeneity conditions that I avoid.^[3] Wooldridge (2005) considers this model but with known factors. Fixed-T studies with unknown factors included Westerlund and Kaddoura (2022) and Brown, Schmidt, and Wooldridge (2023). However, they both assumed exogeneity of b _i with respect to aspects of the covariate process. I introduce estimators that allow for varying degrees of correlation, including one that allows for arbitrary correlation between X _i and b _i. Finally, all results in this paper hold in the case of homogeneous slopes when β _i = β ₀ for all i.

I define p ₀ as the number of factors whose loadings correlate with X _i. This interpretation is similar to Ahn, Lee, and Schmidt (2013) and implicit to the CCE model as discussed in the following section. One justification of this interpretation is to write the full error as D ₀ ρ _i + ϵ _i where D ₀ is a possibly infinite dimensional matrix of common factors and ϵ _i is a vector of idiosyncratic errors. Then F ₀ γ _i is the set of variables from D ₀ ρ _i that are correlated with X _i and the rest are absorbed into the error. However, it is entirely likely that γ _i is correlated with the other loadings that are uncorrelated with X _i. For this reason, I allow the loadings and errors to correlate. The factors are treated as constant. I could alternatively treat them as random and independent of the cross-sectional data like in Westerlund, Petrova, and Norkute (2019).

2.1 Quasi-Long-Differencing

Ahn, Lee, and Schmidt (2013) start with equation (1). Neither F ₀ nor γ _i can be separately identified because F ₀ γ _i = ( F ₀ A ) A ⁻¹ γ _i for any nonsingular p ₀ × p ₀ rotation matrix A . Ahn, Lee, and Schmidt (2013) choose a particular rotation that applies Gaussian elimination to the top p ₀ × p ₀ submatrix of F ₀:

(3) F 0 = Θ 0 ′ , − I p 0 ′

where Θ ₀ is a (T − p ₀) × p ₀ matrix of unrestricted parameters. The given normalization is irrelevant because I am not interested in estimating F ₀. In this case, I only assume that the factors are full rank; the normalization chosen merely reflects this fact. The parameters Θ ₀ are not interesting by themselves, but allow us to eliminate the factors via a convenient reparameterization. Ahn, Lee, and Schmidt (2013) define the quasi-long-differencing (QLD) matrix^[4]

(4) H ( θ 0 ) ′ = ( I T − p 0 , Θ 0 )

where θ ₀ = vec(Θ ₀). Then H ( θ ₀)' F ₀ = 0 . Ahn, Lee, and Schmidt (2013) use this fact to generate moment conditions:

(5) E ( w i ⊗ H ( θ 0 ) ' ( y i − X i β ) ) = 0

where w _i is a vector of all exogenous instruments. Their moments were derived assuming constant slopes, but still hold in the heterogeneous slopes model if we add that w _i is uncorrelated with β _i.

2.2 Common Correlated Effects

The CCE model in Pesaran (2006) adds an additional reduced form equation that represents the relationship between the covariates and the factor structure:

(6) X i = F 0 Γ i + V i

where Γ _i is a p ₀ × K matrix of factor loadings and V _i is a T × K matrix of idiosyncratic errors. Assuming that the idiosyncratic errors have mean zero, CCE proxies for the factors with the matrix F ̂ = ( y ̄ , X ̄ ) where ( y ̄ , X ̄ ) = 1 N ∑ i = 1 N ( y i , X i ) are the cross-sectional averages of y _i and X _i. A number of other papers have studied a similar version of this estimation scheme, including Westerlund, Petrova, and Norkute (2019), Juodis and Sarafidis (2020), Norkutė et al. (2021), and Brown, Schmidt, and Wooldridge (2023).

The pooled common correlated effects (CCEP) estimator of β ₀ treats the cross-sectional averages as having unit-specific slopes and can be represented as

(7) β ̂ C C E P = ∑ i = 1 N X i ′ M F ̂ X i − 1 ∑ i = 1 N X i ′ M F ̂ y i

where M F ̂ = I T − F ̂ ( F ̂ ′ F ̂ ) + F ̂ ′ . ′+′ denotes a Moore–Penrose inverse. Pesaran (2006) derives the CCEP estimator under the following intuition: first, write Z _i = ( y _i, X _i). Then

(8) E ( Z i ) = F 0 E ( C i )

where C _i = ( γ _i + Γ _i β _i, Γ _i) and the slopes are assumed uncorrelated with V _i. M F ̂ then asymptotically eliminates the space spanned by F ₀, including F ₀ γ _i. All moment conditions are written in terms of the general index i because I assume the data is randomly sampled.

Westerlund, Petrova, and Norkute (2019) show that when T is fixed, M F ̂ generally converges to the space orthogonal to both F ₀ and a random term that is a function of the model’s idiosyncratic errors. For the sake of simplicity, suppose that M F ̂ → p M F 0 as is the case when p ₀ = K + 1. Then the CCEP estimator is based on the moment conditions

E ( X i ′ M F 0 ( y i − X i β 0 ) ) = 0

Assuming E( V _i) = 0 as in Pesaran (2006) and Westerlund, Petrova, and Norkute (2019), the reduced form portion of the CCE model also implies E ( M F 0 X i ) = 0 . Since the CCE approach estimates no parameters in this additional set of moments, they are uninformative for estimating β ₀. In practice, we could think of the CCE moments as

(9) E ( X i ′ M μ Z ( y i − X i β 0 ) ) = 0

(10) E ( Z i − μ Z ) = 0

as there are no overidentifying conditions in the model that lead to the CCEP estimator. Here, the T × (K + 1) matrix μ _Z is a placeholder to define the mean of Z _i.

Pesaran (2006) assumes the idiosyncratic errors in both equations are mutually independent and independent over time. He also assumes random sampling of the factor loadings as well as independence between the loadings and the idiosyncratic errors. Westerlund, Petrova, and Norkute (2019) assume the errors are still mutually independent, but allow arbitrary unconditional serial correlation in both u _i and V _i. However, their main departure comes in the factor loadings. They assume the loadings form a constant sequence with no restriction other than a full rank requirement on their sums. This assumption allows more general sampling schemes and an arbitrary relationship between { γ i } i = 1 ∞ and { Γ i } i = 1 ∞ , but also imply they are independent of any function of the errors, ruling out potential misspecification.

A particularly harsh restriction of the CCEP estimator is its rank condition. M F ̂ is a residual-maker matrix and so it has rank T − (K + 1). The restriction T > K + 1 is practically binding regardless of the asymptotic analysis. Even if T is large in a given sample, it must still bound the number of covariates, which is often large in microeconometric applications. Also, when K + 1 > p ₀, the CCEP estimator unnecessarily removes variation from the data which could improve precision of the estimator.

2.3 QLD and the Pure Factor Structure

The pure factor structure in equation (8) can be used for estimating the parameters in equation (3):

(11) E ( H ( θ 0 ) ′ Z i ) = 0

where θ ₀ = vec(Θ ₀). I also define H ₀ = H ( θ ₀) for notational convenience. I show explicitly in the following section how and when these additional moments are useful for the purpose of identification and efficiency, which demonstrates the usefulness of the QLD transformation in studying the CCE model.

The QLD transformation can also exploit moment conditions implied by assumptions on the loadings. This paper takes a “fixed effects” approach in allowing γ _i and Γ _i to be arbitrarily correlated with each other and the idiosyncratic errors. If one wishes to maintain that the factor loadings are constant or independent as in Westerlund, Petrova, and Norkute (2019) and Pesaran (2006), and include the assumption that the random slope deviations are independent of X _i as in Westerlund and Kaddoura (2022), we have additional moment conditions:

(12) E ( H 0 ′ V i ⊗ H 0 ′ ( y i − V i β 0 ) ) = 0

(13) E ( H 0 ′ V i ⊗ ( y i − X i β 0 ) ) = 0

(14) E ( X i ⊗ H 0 ′ ( y i − V i β 0 ) ) = 0

(15) E ( H 0 ′ ( y i − V i β 0 ) ) = 0

(16) E H 0 ′ V i = 0

Equations (12)–(16) list (T − p ₀) ((T − p ₀)K + 2TK + K + 1) moment conditions that displays the strength of the CCE assumptions made in current applications. Again, CCEP only uses the moments E X i ′ M F 0 u i = 0 . Recent papers like Westerlund, Petrova, and Norkute (2019), Westerlund and Kaddoura (2022), and Brown, Schmidt, and Wooldridge (2023) have tried to show robustness of CCE in a fixed-T setting without considering how these additional moments may improve efficiency. The current paper can then be seen as looking at the fixed-T information implications of the CCE model.

It is instructive to understand where these moment conditions come from, and if the assumptions that give us these moment conditions are necessary for consistency. Equation (12) exploits independence between the defactored covariates the defactored errors in the main equation, a condition similar to the strict exogeneity assumption used in fixed effects regressions. Equation (13) comes from independence between the defactored covariates and the full error F ₀ γ _i + u _i. I show in the next section that this condition is unnecessary for consistency, but failure leads to incorrect inference when using the usual clustered standard errors for CCEP. Equation (14) comes from independence between Γ _i and the idiosyncratic errors u _i. Again, these moment conditions come from either assuming the loadings are fixed or random and independent of the errors. The last two equations are a result of the model’s specification: after removing the parameters β ₀, eliminating the factors leaves mean zero errors. Equation (16) specifically will be used to identify H ₀ in the next section.

3 Estimation

I now state this paper’s primary assumptions. The first assumption defines the model of interest. The second set specifies the pure factor structure in X _i similar to Westerlund, Petrova, and Norkute (2019).

Assumption 1.

(Linear population model): (i) y _i = X _i( β ₀ + b _i) + F ₀ γ _i + u _i where (ii) b _i ∼ ( 0 , Σ _b).

Assumption 2.

(CCE reduced form equations): (i) X _i = F ₀ Γ _i + V _i; (ii) ( b _i, γ _i, Γ _i, V _i, u _i) are iid across i with finite fourth moments; (iii) E( V _i) = 0 and E( u _i| V _i) = 0 ; (iv) Rk( F ₀) = p ₀ and Rk ( E ( γ i + Γ i β i , Γ i ) ) = p 0 ≤ K + 1 .

Assumption 1 defines the relevant population model. I do not explicitly state the exogeneity conditions on the random slopes because different estimators will be shown to be consistent under different assumptions, as will be made clear in the relevant theorems. Assumption 2 specifies the pure factor assumption similar to Pesaran (2006) and Westerlund, Petrova, and Norkute (2019). Unlike these CCE analyses, I do not require independence between the errors in the main or reduced form equations. In fact, I only restrict E( u _i| V _i) = 0 but place no assumptions on the conditional distribution D( V _i| u _i). This assumption allows for heteroskedasticity conditional on both observables and unobservables in both sets of errors which, while common in the fixed-T GMM literature, is ruled out in the CCE approaches of Pesaran (2006) and Westerlund, Petrova, and Norkute (2019).

I assume the factor loadings are random and iid in the cross section. I could relax this assumption at the cost of notational complexity. Westerlund, Petrova, and Norkute (2019) show that more general sampling techniques are allowed in the asymptotic analysis. For example, I could replace Assumption 2(iv) with Assumption C of Westerlund, Petrova, and Norkute (2019). However, I do not assume ( γ _i, Γ _i) is orthogonal to ( u _i, V _i). While this assumption is reasonable when the factor structure captures all correlation between X _i and the full error Fγ _i + u _i, it can fail if the model is misspecified. I show in Section 3.2 that N -consistent estimation is possible even if 1 N ∑ i = 1 N V i ⊗ γ i does not converge to zero due to model misspecification. Further, because I assume random sampling, I define population moments in terms of a generic ‘i’ unit. For example, Assumption 2(iv) makes conditions on E( γ _i), which is the population mean for every unit’s factor loadings.

As discussed earlier, I do not require T > K + 1, unlike the CCEP estimator. I directly use the moments E H 0 ′ Z i = 0 to remove the factors and only require K ≥ p ₀ + 1, a restriction also made by Pesaran (2006) and Westerlund, Petrova, and Norkute (2019). As long as there are enough time periods to cover all the unobserved effect, my procedure can allow for an arbitrary number of covariates. I also discuss in Section 3.2 how to include known factors like a heterogeneous intercept that decreases the number of relevant factors and makes the assumption even less restrictive.

3.1 CCE Moment Conditions for GMM

I now look at the moment conditions implied by Assumption 2. All of the following results pertain to a GMM estimator that estimates all parameters jointly using all available moments. Equation (11) of Section 2, E H 0 ′ Z i = 0 where Z _i = ( y _i, X _i), implies that Assumption 2 provides information on θ ₀ that leads to more efficient estimation of β ₀ and provides a first-stage estimator, which negates the need for the full joint estimator of Ahn, Lee, and Schmidt (2013). I first consider identification of θ ₀ from the pure factor structure alone to show that it in fact yields valid moments. As in Ahn, Lee, and Schmidt (2013), p is the number of factors specified by the econometrician.

Lemma 3.1.

Under Assumption 2, and given E( V _i β _i) = 0 , θ ₀ is identified by E( H ( θ )′ Z _i) = 0 if and only if p = p ₀.

All proofs are contained in the Appendix.

The condition that β _i is uncorrelated with the errors in X _i can be relaxed if we only use E( H ₀ X _i) = 0 for identification, but this comes at the cost of an additional possible factor. We can use Lemma 3.1 to provide an estimator of θ ₀ based on the covariates alone. Let

(17) H ̂ = H ( θ ̂ )

Then define A θ = E ( vec H 0 ′ Z i vec H 0 ′ Z i ′ ) , and D θ = E ( ∇ θ vec H 0 ′ Z i ) where ∇_θ is the gradient with respect to θ .

Theorem 3.1.

Suppose Assumption 2 holds, and let θ ̂ be the GMM estimator based on E ( vec H 0 ′ Z i ) = 0 using a consistent estimator of the optimal weight matrix. Then (i) N ( θ ̂ − θ 0 ) → d N ( 0 , D θ ′ A θ − 1 D θ − 1 ) . Now suppose that A ̂ θ → p A θ using a consistent first-step estimator of θ ₀.

(ii) If p ₀ = p then N − 1 ∑ i = 1 N vec H ̂ ′ Z i ′ A ̂ θ − 1 ∑ i = 1 N vec H ̂ ′ Z i → d χ 2 ( ( T − p 0 ) ( K + 1 − p 0 ) ) .

(iii) If p ₀ > p, then N − 1 ∑ i = 1 N vec H ̂ ′ Z i ′ A ̂ θ − 1 ∑ i = 1 N vec H ̂ ′ Z i → p ∞ .

The proof comes from standard theory; see Hansen (1982). The estimator of the optimal weight matrix is A ̂ θ = 1 N ∑ i = 1 N vec ( H ( θ ̃ ) ′ Z i ) vec ( H ( θ ̃ ) ′ Z i ) ′ where θ ̃ is a consistent first-stage estimator of θ ₀.

It is entirely possible there are variables in the data set that are linear in the factors but not relevant for estimation. In this case, one can simply use them to estimate θ ₀ but drop them from the estimating equation. Further, if relevant variables are not linear in F ₀, they should be dropped from the estimation in Theorem 3.1. This can occur if there are polynomial or interactive functions of the covariates in the estimating equation.

I now demonstrate that the additional reduced form moments generally improve efficiency of estimating β ₀ by providing non-redundant moment conditions. The following theorem completely characterizes when the moments E H 0 ′ X i = E H 0 ′ V i = 0 are partially redundant for estimating β ₀; that is, it describes when adding the moment conditions does not decrease the asymptotic variance of N ( β ̂ − β 0 ) . I do not include E H 0 ′ y i = 0 in the reduced form because the efficiency result would require additional assumptions on Var( u _i). Let g _i1( β , θ ) = vec( X _i) ⊗ H ( θ )′( y _i − X _i β ) and g _i2( θ ) = H ( θ )′ X _i. I also need the following condition:

Assumption 3.

(QLD identification): E γ i γ i ′ and E ( H 0 ′ X i ⊗ vec ( X i ) ) , I T − p 0 ⊗ E ( vec ( X i ) γ i ′ ) have full rank.

Assumption 3 identifies the parameters β ₀ and θ ₀ using the Ahn, Lee, and Schmidt (2013) moments. It requires non-degenerate factor loadings that are correlated with the instruments; if the factor loadings are not correlated with observables, one can simply estimate β ₀ via a two-way fixed effects regression (Westerlund 2019).

Theorem 3.2.

Given Assumptions 1–3, suppose E( u _i| X _i) = 0 , E( b _i| X _i) = 0 . Then the moment conditions E( g _i2( θ ₀)) = 0 are partially redundant for estimating β ₀ if and only if

(18) E ( H 0 V i ⊗ vec ( X i ) ) E ( H 0 u i ⊗ vec ( X i ) H 0 u i ⊗ vec ( X i ) ) − 1 E ( I ( T − p 0 ) ⊗ ( vec ( X i ) γ i ′ F 0 ′ ) ) S = 0

where S = K T ( T − p 0 ) ( 0 ( T − p 0 ) 2 × ( T − p 0 ) p 0 ′ , I ( T − p 0 ) p 0 ) ′ and K T ( T − p 0 ) is the T(T − p ₀) × T(T − p ₀) commutation matrix.

I assume E( u _i| X _i) = 0 whereas Assumption 2 implies the weaker E( u _i| V _i) = 0 . I make the stronger exogeneity assumption for simplicity, though the moment conditions in g _i1 could be reformulated to include the instruments H 0 ′ V i instead of X _i. I could similarly relax the random slope exogeneity condition.

Theorem 3.2 implies that whenever the parameters are identified using the moment conditions from Ahn, Lee, and Schmidt (2013), and the covariates are not eliminated by the defactoring, exploiting the CCE model improves efficiency for estimating β ₀. The first expectation is generally nonzero because it contains covariances between the matrix V _i and itself. The other two expectations are always nonzero because they have full column rank; they come from the moment conditions that identify β ₀ and θ ₀. Simple examples of linear redundancy where equation (18) holds includes θ ₀ being known to the researcher (known factors) and p ₀ = 0. OLS is preferred to QLD and CCE in both cases.

I now turn to incorporating additional covariates into a QLD analysis. As discussed in the prior section, a benefit of CCE regressions is that we do not require knowledge of p ₀. However, including irrelevant proxies will generally decrease efficiency. Recent papers by Margaritella and Westerlund (2023) and Brown and Westerlund (2023) study different criteria for dropping unnecessary cross-sectional averages from CCE regression. I can now show that additional covariates satisfying the CCE condition generally imply information for the QLD GMM estimator.

Theorem 3.3.

Let X _i1 and X _i2 respectively be T × K ₁ and T × K ₂ matrices of covariates that jointly satisfy Assumption 2 and suppose that K ₁ ≥ p ₀. Then E H 0 X i 2 = 0 is redundant given E H 0 X i 1 = 0 if and only if

(19) F 0 * E Γ i21 ′ ⊗ I T − p 0 ⋮ F 0 * E Γ i2K ′ ⊗ I T − p 0 = Ω 21 Ω 11 − 1 F 0 * E Γ i11 ′ ⊗ I T − p 0 ⋮ F 0 * E Γ i1K ′ ⊗ I T − p 0

where Ω 21 = E vec H ( θ 0 ) V i 2 vec H ( θ 0 ) V i 1 ′ , Ω 11 = E vec H ( θ 0 ) V i 1 vec H ( θ 0 ) V i 1 ′ , Γ _ijk is the k’th row of Γ _ij for j = 1, 2, and F 0 * is the last (T − p ₀) rows of F ₀.

Because K ₁ ≥ p ₀ so that X _i1 identifies θ ₀, redundancy occurs when the CCE rank condition fails. One case where redundancy cannot occur is when the covariates have uncorrelated errors. Redundancy holds here if and only if E Γ i 2 = 0 . But this result would violate the CCE rank condition and so Assumption 2 would fail. Combining Theorems 3.2 and 3.3 tells us that incorporating additional covariates that satisfy CCE into the QLD estimation will improve estimation of β ₀, even if θ ₀ is already identified.

3.2 Linear QLD Estimators

The QLD GMM approach of Ahn, Lee, and Schmidt (2013) can select appropriate instruments for a given time period. However, an abundance of moment conditions leads to additional problems. For one, it can present problems with finite-sample bias and computational complexity. It may also cause the global identifying assumptions to fail and induce local stationary points in the objective function (Hayakawa 2016). This section introduces the linear pooled and mean group estimators based on the QLD transformation. They allow for a variety of rank and exogeneity conditions that are especially useful when the researcher includes heterogeneous slopes in the model, like in Section 4. One can first defactor the data using the estimated H ̂ then run the relevant regression:

(20) β ̂ Q L D P = ∑ i = 1 N X i ′ H ̂ H ̂ ′ X i − 1 ∑ i = 1 N X i ′ H ̂ H ̂ ′ y i

The pooled quasi-long-differencing (QLDP) estimator defined by equation (20) is the pooled OLS estimator from regressing H ̂ ′ y i on H ̂ ′ X i . A similar estimator was mentioned in Breitung and Hansen (2021) but not formally studied. The mean group quasi-long-differencing (QLDMG) estimator can be obtained by running the T − p observation time series regression H ̂ ′ y i on H ̂ ′ X i for each i, and then averaging each of the N estimates:

(21) β ̂ Q L D M G = 1 N ∑ i = 1 N X i ′ H ̂ H ̂ ′ X i − 1 X i ′ H ̂ H ̂ ′ y i

Intuitively, the mean group estimator should allow for arbitrarily correlation between the random slopes and covariates at the cost of rank assumptions and precision. To see how, note that

(22) β ̂ Q L D M G = 1 N ∑ i = 1 N β i + 1 N ∑ i = 1 N X i ′ H ̂ H ̂ ′ X i − 1 X i ′ H ̂ H ̂ ′ ( F 0 γ i + u i )

Then given an appropriate uniform law of large numbers applies to H ̂ , the mean group QLD estimator is consistent for E( β _i) regardless of the correlation between X _i and β _i.

If both are consistent, one should generally choose the pooled estimator over the mean group one for reasons of efficiency. The pooled QLD allows us to relax the rank conditions used in Ahn, Lee, and Schmidt (2013) and Westerlund, Petrova, and Norkute (2019). Instead of E ( vec ( X i ) ⊗ H 0 ′ ( y i − X i β 0 ) ) = 0 , we can use the moments E ( X i ′ H 0 H 0 ′ ( y i − X i β 0 ) ) = 0 . This residual represents a just-identified system of moments, requires no outside instruments, and allows E γ i γ i ′ and E( γ _i) to be completely arbitrary. As also discussed earlier, the QLD transformation does not remove more variation from the data than necessary. The CCE transformation, M F ̂ , is the same even if the econometrician knows p ₀. The QLD transformation efficiently uses information on the number of factors. Simulations in Section 4 demonstrate that the QLDP estimator is often more efficient than the CCEP estimator.

The following table displays the three estimators under consideration:

Estimator	Moment conditions	Arbitrary slopes?
GMM (ALS with CCE moments)	E ( vec ( X i ) ⊗ H 0 ′ ( y i − X i β 0 ) ) = 0 E H 0 ′ Z i = 0	No
QLDP	E ( X i ′ H 0 H 0 ′ ( y i − X i β 0 ) ) = 0 E H 0 ′ Z i = 0	No
QLDMG	E ( X i ′ H 0 H 0 ′ X i − 1 X i ′ H 0 H 0 ′ y i − β 0 ) ) = 0 E H 0 ′ Z i = 0	Yes

We can see from the number of moment conditions that the GMM estimator has many more overidentifying restrictions than the linear estimators. We can also see that while the mean group estimator allows slope heterogeneity, it requires stronger identifying assumptions. For instance, it has the only moment conditions where an inverse is taken within an expectation.

Before proving asymptotic normality, I point out that the case of p = K + 1 implies a powerful algebraic fact about the pooled QLD estimator: it is the same whether or not the researcher includes common variables in the regression. That is, all variables that do not vary over i are irrelevant to the estimation of β ₀, which includes time dummies. Further, the pooled QLD residuals are the same with or without the inclusion of common variables. Note that I say p = K + 1 instead of p ₀ = K + 1 as the following theorem is purely algebraic and independent of model specification or statistical properties. Let W be a (T − p) × q matrix of common variables, and let ( α ̃ ′ , β ̃ ′ ) ′ be the estimates from the pooled regression of H ̂ ′ y i on H ̂ ′ [ W , X i ] . Finally, let ϵ ̂ i = ( y i − X i β ̂ Q L D P ) and ϵ ̃ i = ( y i − X i β ̃ − W α ̃ ) be the associated residuals.

Theorem 3.4.

Suppose p = K + 1. If R k ( H ̂ ′ W ) = q , then (i) β ̂ Q L D P = β ̃ ; (ii) α ̃ = 0 ; ϵ ̂ i = ϵ ̃ i .

The above result suggests that when p = K + 1, where p is the number of factors specified by the econometrician, the QLD matrix suffices to remove all unobserved time effects in the population, even those which do not interact with the heterogeneity. The matrix W refers to any common factors that have equal impact on each cross-sectional unit. These variable are generally captured by time fixed effects in linear regression. Theorem 3.4 says that we do not need to worry about time effects in these special cases. Including X _i is necessary for part (i) of the theorem while including y _i grants parts (ii) and (iii).

It may appear that Theorem 3.4 only applies in very special scenarios; however, simulation evidence in the Appendix suggests that overestimating p ₀ does not cause inconsistency. These results bolster the simulation evidence from Ahn, Lee, and Schmidt (2013) that suggests the same thing when using their GMM estimator. Breitung and Hansen (2021) also demonstrate that the Ahn, Lee, and Schmidt (2013) estimator performs well under the BIC method of estimating p ₀, which has a tendency to overestimate the number of factors. Overestimating p ₀ includes the case of incorrectly estimating factors when p ₀ = 0. Under strict exogeneity, CCE and QLD procedures will be consistent because their factor proxies are just functions of the exogenous variables. Reporting the QLDP that takes p = K + 1 could then serve as a robustness check if the estimated p ₀ is less than K + 1.

3.3 Asymptotic Results

I now show asymptotic normality for the QLDP and QLDMG estimators. I demonstrate how first-stage estimation of θ ₀ can affect the asymptotic distribution and show why ignoring this problem leads to incorrect standard errors even when the estimators are asymptotically normal^[5] I only need exogeneity of V _i with respect to u _i for asymptotic normality; the other assumptions only simplify the asymptotic variance.

Theorem 3.5.

Given Assumptions 1 and 2, suppose that (i) A P = E V i ′ H 0 H 0 ′ V i has full rank and (ii) E ( V i ′ H 0 H 0 ′ ( V i b i + u i ) ) = 0 . Then β ̂ Q L D P → p β 0 and N ( β ̂ Q L D P − β 0 ) → p N 0 , A P − 1 B P A P − 1 where B P = E ( ( V i ′ H 0 H 0 ′ u i + G P r i ( θ 0 ) ) ( V i ′ H 0 H 0 ′ u i + G P r i ( θ 0 ) ) ′ ) , r _i( θ ₀) is derived from Theorem 3.1, and G P = E ( ∇ θ X i ′ H ( θ ) H ( θ ) ′ ( X i b i + F 0 γ i + u i ) ) evaluated at θ = θ ₀. If E( u _i ⊗Γ _i) = 0 , E( V _i ⊗ b _i) = 0 , and E( V _i ⊗ γ _i) = 0 , then G _P = 0 .

Remark (Known factors): Eliminating known factors like random intercepts or polynomial time trends can make the QLD estimators more precise. Regress [ y _i, X _i] unit-by-unit onto the known factors, then estimate θ ₀ as in Theorem 3.1 using the residuals. Further, removing known factors can make the QLDP estimator more robust. According to Theorem 3.4, removing a random intercept and setting p = K + 1 explicitly nests the popular two-way error structure. ■

Remark (Bootstrap): While I provide analytic inference below, the standard errors can be quite complicated in general. N ( β ̂ Q L D P − β 0 ) is asymptotically normal so that one can instead do inference via the nonparametric bootstrap. Just resample over ( y _i, X _i), with H ̂ estimated for each new sample to account for the first-stage estimation in the final standard errors. This procedure contrasts to Section 2 of the Supplement to Westerlund, Petrova, and Norkute (2019) that does not estimate F ̂ with each new sample, a procedure that is inconsistent if γ _i is correlated with V _i. I do not provide a proof of consistency because the problem is standard; Westerlund, Petrova, and Norkute (2019) needed a proof because the CCE projection matrix has a reduced-rank limit. ■

The asymptotic variance can be estimated by replacing population parameters with their estimates, such as H ( θ ̂ ) ( y i − X i β ̂ Q L D P ) for H ₀ u _i, and estimating expectations with sample averages. The functional forms of G _P and r i ( θ ̂ ) are derived in the Appendix. When G _P = 0 , the standard errors take the usual cluster-robust form, similar to the standard errors derived in Westerlund, Petrova, and Norkute (2019) where H ̂ H ̂ ′ is replaced by M F ̂ . However, whenever E(Γ _i ⊗ u _i) ≠ 0 or E( γ _i ⊗ V _i) ≠ 0 due to model misspecification that does not cause inconsistency, this additional term remains in the asymptotic variance.^[6]

Even if we assume G _P = 0 along with conditional homoskedasticity and zero serial correlation in u _i, the asymptotic variance will still take the sandwich form, suggesting it is less efficient than CCE.^[7] The CCE estimator will also take the sandwich form if K + 1 > p ₀ by the work in Westerlund, Petrova, and Norkute (2019). A direct comparison of asymptotic variances would require stronger restrictions on the idiosyncratic errors, similar to comparing first-differencing versus the within estimator. The simulations in the following section demonstrate that QLDP tends to have a smaller variance in finite replications. It is likely that such efficiency gains come from not over-estimating the number of factors like CCEP.

I now demonstrate asymptotic normality of the mean group estimator. As described earlier, the QLDMG will allow for arbitrary correlation between the slope deviations and the rest of the model’s stochastic components but requires stronger identifying assumptions. Define T as the parameter space of θ ₀ and let a i ( θ ) = ∑ k = 1 K σ k ( X i ′ H ( θ ) H ( θ ) ′ X i ) − 1 where { σ k ( D ) } k = 1 K are the singular values of the K × K matrix D .

Theorem 3.6.

Given Assumptions 1 and 2, suppose that (i) the eigenvalues of X i ′ H ( θ ) H ( θ ) ′ X i are almost surely positive uniformly over T ; (ii) max E a i ( θ ) X i u i , E a i ( θ ) 2 X i 3 u i < ∞ uniformly over T ; (iii) T is a compact subset of R ( T − p 0 ) p 0 . Then β ̂ Q L D M G → p β 0 and N ( β ̂ Q L D M G − β 0 ) → d N ( 0 , B M G ) where B M G = E ( V i ′ H 0 H 0 ′ V i − 1 V i ′ H 0 H 0 ′ u i + G M G r x , i ( θ 0 ) V i ′ H 0 H 0 ′ V i − 1 V i ′ H 0 H 0 ′ u i + G M G r x , i ( θ 0 ) ′ ) . If E( b _i| V _i) = 0 and E( V _i ⊗ γ _i = 0 ), then G _MG = 0 .

Standard errors are derived similarly to the pooled QLD estimator where population parameters are replaced by estimates and expectations are replaced with sample averages. Theorem 3.6 is the first fixed-T proof of asymptotic normality for a mean group estimator that allows for arbitrary random factors. Like with the pooled estimator, the N -asymptotic normal convergence result in Theorem 3.6 implies that inference can be done via the usual nonparametric bootstrap, estimating θ ̂ for each new bootstrap sample.

Remark (Order conditions): Similar to the pooled estimator, one advantage of the QLD transformation is that it allows for more variables than the CCE when p ₀ is small. CCE uses ( y ̄ , X ̄ ) to control for the factors. The rank of M F ̂ is generally T − (K + 1) in finite samples, regardless of the number of factors. The rank of H ̂ H ̂ ′ is T − p and assumed to be greater than T − (K + 1) in Westerlund, Petrova, and Norkute (2019). ■

4 Simulations

This section considers the finite-sample performance of the QLD estimators compared to the GMM and CCE estimators of Ahn, Lee, and Schmidt (2013) and Pesaran (2006) respectively. The main model is

y i = X i β 0 + F 0 γ i + u i X i = F 0 Γ i + V i

as in Assumptions 1 and 2. There are two variables with slopes β ₀ = (1, 1)′ I do not include random slopes as they would only serve to increase the amount of noise in the model. I refer the reader to Campello, Galvao, and Juhl (2019) for simulation studies regarding the performance of pooled estimators when slopes are correlated with the variables of interest.

The two factors are generated as AR(1) random processes with initial value from a normal distribution with mean 1 and variance 1, having parameters 0.75 and −0.75 respectively. The factors are generated once then fixed over repeated replications. The loadings on X _i are drawn as

Γ i ∼ N ( 1,1 ) N ( 0,1 ) N ( 0,1 ) N ( 1,1 ) γ i ∼ N ( Γ 1,1 , 1 ) N ( Γ 2,2 , 1 )

where Γ_1,1 and Γ_2,2 are the upper-left and bottom-right diagonal values of Γ _i. The errors u _i and V _ik (k = 1, 2) are independently drawn from a multivariate normal distribution with mean 0 _T×1 and variance C where C is the correlation matrix from an AR(1) process with parameter 0.75. Each simulation study includes 1,000 replications.

Table 1 compares the Ahn, Lee, and Schmidt (2013) estimator both with and without the additional moments E H 0 ′ Z i = 0 . Both estimators are computed as two-step estimators where the optimal weight matrix is calculated with a consistent first-step estimator. The first-step estimator uses an identity weight matrix. I report the results for each (N, T) pair for both sets of coefficients.

Table 1:

GMM estimators.

		Bias		SD		RMSE
		GMM1	GMM2	GMM1	GMM2	GMM1	GMM2
N = 50	T = 3	0.0328	−0.0107	0.2326	0.1812	0.2349	0.1815
		−0.0053	−0.0167	0.1719	0.1690	0.1720	0.1698
	T = 4	0.0026	−0.0225	0.2997	0.1518	0.2997	0.1535
		0.0781	−0.0196	0.3184	0.1424	0.3279	0.1438
	T = 5	−0.0008	−0.0249	0.3702	0.1694	0.3702	0.1712
		0.2631	−0.0055	0.4922	0.2057	0.5581	0.2058
N = 300	T = 3	0.0111	0.0015	0.1057	0.0594	0.1063	0.0594
		0.0020	0.0015	0.0588	0.0597	0.0588	0.0597
	T = 4	0.0033	−0.0020	0.1187	0.0427	0.1188	0.0428
		0.0084	0.0001	0.0749	0.0414	0.0754	0.0414
	T = 5	−0.0126	−0.0016	0.1633	0.0364	0.1638	0.0365
		0.1903	−0.0029	0.4069	0.0367	0.4492	0.0368

This table presents a set of simulations with 1,000 replications. The two rows for a given pair of N and T are the values associated with estimators of each of the two coefficients. “SD” and “RMSE” are respectively the standard deviation and root mean squared error of the estimators over all replications for a given experiment. ‘GMM1’ refers to the Ahn Lee, and Schmidt (2013) estimator using vec( X _i) as instruments. ‘GMM2’ uses these moments, as well as the reduced form moments in equation (11). Both estimators are computed using an optimal weight matrix that is a function of an initial consistent first-stage estimator.

The GMM estimator based on the Ahn, Lee, and Schmidt (2013) residual E ( vec ( X i ) ⊗ H 0 ′ ( y i − X i β 0 ) ) only is GMM1, whereas the GMM estimator using the Ahn, Lee, and Schmidt (2013) residual and the additional moments E H 0 ′ Z i = 0 is GMM2. GMM1 uses TK(T − 2) moments while GMM2 uses an additional (T − 2)K. The GMM estimator using both sets of moments generally outperforms the original Ahn, Lee, and Schmidt (2013) estimator in terms of root mean square error, implying that the additional moments are practically relevant in finite samples.

Before turning to a comparison of the pooled QLD and CCE estimators, I first investigate the performance of QLDP when p ₀ is misspecified in estimation of θ ₀. p ₀ = 2 and I look at the performance of QLDP for estimation under p = 1, 2, 3.

Table 2 gives the results for the QLDP under the different specifications. My results track with previous simulation evidence provided by Ahn, Lee, and Schmidt (2013) and Breitung and Hansen (2021). Underestimating p ₀ leads to substantial bias that does not decrease with N. However, overestimating p ₀ leads to only slightly worse performance than correct specification. The bias is larger but decreases with N; in fact, even N = 300 gives reasonable bias for the p = 3 estimator. The p = 3 estimator also performs worse than the correctly specified estimator in terms of standard deviation, which is not surprising. Overall, I find evidence that overestimation of p ₀ does not lead to substantial bias in estimation, but underestimating p ₀ can.

Table 2:

Misspecifying p ₀.

		Bias			SD			RMSE
		p = 1	p = 2	p = 3	p = 1	p = 2	p = 3	p = 1	p = 2	p = 3
N = 50	T = 4	0.2700	0.0078	0.0118	0.1677	0.1097	0.1466	0.3178	0.1100	0.1471
		0.4024	0.0029	0.0120	0.1814	0.1097	0.1561	0.4414	0.1098	0.1566
	T = 5	0.4662	0.0095	0.0154	0.3511	0.1005	0.1282	0.5836	0.1009	0.1291
		0.5372	0.0058	0.0119	0.4111	0.0950	0.1228	0.6764	0.0952	0.1234
	T = 6	0.1697	0.0074	0.0126	0.1534	0.0956	0.1239	0.2287	0.0959	0.1246
		0.5843	0.0132	0.0200	0.1516	0.1025	0.1222	0.6036	0.1034	0.1238
N = 300	T = 4	0.2748	−0.0003	0.0000	0.0657	0.0424	0.0559	0.2826	0.0424	0.0559
		0.4087	0.0024	0.0030	0.0746	0.0411	0.0587	0.4154	0.0411	0.0588
	T = 5	0.5267	0.0008	0.0032	0.2545	0.0382	0.0491	0.5849	0.0383	0.0492
		0.5993	0.0007	0.0038	0.2953	0.0369	0.0474	0.6681	0.0369	0.0476
	T = 6	0.1484	0.0015	0.0027	0.0646	0.0392	0.0470	0.1618	0.0392	0.0471
		0.6191	0.0013	0.0020	0.0596	0.0406	0.0480	0.6220	0.0406	0.0480

This table presents a set of simulations with 1,000 replications. The two rows for a given pair of N and T are the values associated with estimators of each of the two coefficients. The columns refer to the QLDP estimator that uses the given value of p in the estimation of the parameters θ ₀. “SD” and “RMSE” are respectively the standard deviation and root mean squared error of the estimators over all replications for a given experiment.

I also consider hypothesis testing for different specifications of p. Using the same model but setting β ₀ = (0, 0)′, I construct the QLDP estimators under p = 1, p = 2, and p = 3, when the true value is p ₀ = 2. Table 3 includes the average rejection rate for the usual Wald statistics of the individual hypothesis tests H ₀: β ₁ = 0 and H ₀: β ₂ = 0 against the relevant two-sided alternative. I estimate the standard error via nonparametric bootstrap with I carry out the tests at the 5 % level, so the test is considered a rejection if the p-value associated with the Wald statistic (evaluated with a standard normal distribution) is greater than or equal to 0.975. We can see that correct specification and overestimation of p ₀ leads to reasonable rejection rates when the null hypothesis is true, especially as N increases for a given T. However, underestimating p ₀ gives wildly unrealistic rejection rates due to the bias caused by underestimating p ₀ as described by Table 2. These results further bolster the simulation evidence of Breitung and Hansen (2021) who study the classical Ahn, Lee, and Schmidt (2013) GMM estimator.

Table 3:

Inference with misspecified p ₀.

		Reject (×100)
		p = 1	p = 2	p = 3
N = 50	T = 4	46.30	7.70	7.40
		81.70	7.70	10.20
	T = 5	79.70	7.60	9.60
		78.00	6.40	6.40
	T = 6	18.10	6.90	8.80
		99.00	8.30	8.40
N = 300	T = 4	98.10	5.10	4.80
		100.00	4.50	6.60
	T = 5	99.50	4.70	6.60
		99.90	4.90	4.00
	T = 6	41.60	6.20	5.00
		100.00	6.50	5.90

This table presents a set of simulations with 1,000 replications. The DGP is identical to the DGP described at the beginning of the section but with β ₀ = (0, 0)′. The columns correspond to the average rejection rate of the Wald statistic for the hypothesis test H ₀: β ₁ = 0 and H ₀: β ₂ = 0 under the different specifications of p when p ₀ = 2. The rows within the columns correspond to the rejection rates for the tests of the respective parameters associated with the two covariates, x _it1 and x _it2. I calculate the Wald statistic using the standard errors in Theorem 3.5 but with G _P = 0 due to the nature of the DGP. A p-value is calculated for each statistic using a standard normal cdf. The test is considered a rejection for the test of the given parameter if the p-value is greater than or equal to 0.975. The final value is multiplied by 100.

Table 4 looks at the QLDP estimator compared to the CCEP estimator where the QLD transformation is estimated under p = p ₀ = 2 when K = 2 (returning the original DGP with β ₀ = (1, 1)′). I omit the GMM estimators because they are outperformed by the just-identified QLDP in terms of root mean square error. Unsurprisingly, the QLDP bias is significantly lower than both GMM estimators. Its standard deviation is often significantly lower when N is smaller, but this trend starts to reverse compared to GMM2 for larger samples. Note that the CCEP is badly biased when T = 3 as K + 1 = 3. However, the QLDP is still consistent here. Further, the QLD estimators takes p ₀ as known while the CCE estimators “overestimates” p ₀ with the cross-sectional averages, of which there are K + 1 = p ₀ + 1. One might suspect this overestimation leads to inefficiency, which is born out by the SD of the simulations. The QLDP estimator consistently shows a 15%–25 % decline in standard deviation over the CCEP estimator. Further, the CCE identifying condition requires T > K + 1, which causes severe bias when violated. The QLDP estimator significantly outperforms the CCEP estimator in every setting provided.

Table 4:

Pooled estimators.

		Bias		SD		RMSE
		CCEP	QLDP	CCEP	QLDP	CCEP	QLDP
N = 50	T = 3	−0.5525	0.0082	25.9618	0.1546	25.9676	0.1548
		1.2734	0.0034	12.5824	0.1555	12.6467	0.1556
	T = 4	0.0118	0.0078	0.1466	0.1097	0.1471	0.1100
		0.0120	0.0029	0.1561	0.1097	0.1566	0.1098
	T = 5	0.0197	0.0095	0.1220	0.1005	0.1236	0.1009
		0.0089	0.0058	0.1152	0.0950	0.1155	0.0952
N = 300	T = 3	0.0272	0.0024	2.7295	0.0580	2.7296	0.0581
		0.9400	0.0026	3.3976	0.0585	3.5253	0.0585
	T = 4	0.0000	−0.0003	0.0559	0.0424	0.0559	0.0424
		0.0030	0.0024	0.0587	0.0411	0.0588	0.0411
	T = 5	0.0050	0.0008	0.0464	0.0382	0.0467	0.0383
		0.0027	0.0007	0.0441	0.0369	0.0442	0.0369

This table presents a set of simulations with 1,000 replications. The two rows for a given pair of N and T are the values associated with estimators of each of the two coefficients. “SD” and “RMSE” are respectively the standard deviation and root mean squared error of the estimators over all replications for a given experiment.

Comparing Tables 1 and 4, the QLDP performs much better than either of the GMM estimators when N is small despite the fact that we know they are using valid instruments. That the QLDP has better finite-sample performance is most likely due to the fact that it uses a just identified system of moments. I also include simulations in Table 5 where T is larger. The number of parameters estimated by the QLDP grows linearly with T so we would expect poor performance relative to CCEP when T grows. Below, we see that CCEP generally outperforms QLDP in terms of RMSE. When T = 20, the QLDP performs exceptionally poorly. While we may gain efficiency using the QLD estimators when T is small, QLD should not be applied to cases where T is relatively close to N.

Table 5:

Pooled estimators, larger T.

		Bias		SD		RMSE
		CCEP	QLDP	CCEP	QLDP	CCEP	QLDP
N = 50	T = 10	0.0094	0.0053	0.0808	0.0823	0.0813	0.0825
		0.0117	0.0079	0.0765	0.0769	0.0774	0.0773
	T = 15	0.0145	0.0125	0.0643	0.0890	0.0659	0.0899
		0.0131	0.0093	0.0624	0.0901	0.0638	0.0906
	T = 20	0.0149	0.2060	0.0585	0.1719	0.0604	0.2683
		0.0148	0.6018	0.0581	0.2000	0.0599	0.6342
N = 300	T = 10	0.0014	0.0019	0.0305	0.0311	0.0305	0.0311
		0.0029	0.0019	0.0315	0.0318	0.0317	0.0318
	T = 15	0.0017	−0.0001	0.0254	0.0354	0.0254	0.0354
		0.0022	0.0024	0.0254	0.0355	0.0255	0.0355
	T = 20	0.0027	0.2219	0.0223	0.1697	0.0225	0.2794
		0.0029	0.5579	0.0225	0.2365	0.0226	0.6059

This table presents a set of simulations with 1,000 replications. The two rows for a given pair of N and T are the values associated with estimators of each of the two coefficients. “SD” and “RMSE” are respectively the standard deviation and root mean squared error of the estimators over all replications for a given experiment.

Finally, I investigate the performance of the mean group quasi-long-differencing (QLDMG) and mean group common correlated effects (CCEMG) estimators. The QLDMG estimator is given by equation (21) and the CCEMG estimator is identical to the QLDMG estimator but with M F ̂ in place of H ̂ H ̂ ′ . Consistency is proved in Pesaran (2006) but, like the pooled estimator, will eventually require a modern treatment that either controls for the asymptotic degeneracy in M F ̂ like Karabiyik, Reese, and Westerlund (2017) and Westerlund, Petrova, and Norkute (2019). Table 6 contains the results for the mean group estimators where the QLD transformation is estimated assuming p = p ₀ = 2. I start at T = 5 so that T − p ₀ > p ₀ and the CCEMG estimator is well-defined.

Table 6:

Mean group estimators.

		Bias		SD		RMSE
		CCEMG	QLDMG	CCEMG	QLDMG	CCEMG	QLDMG
N = 50	T = 5	−1.5703	−0.0055	34.8038	0.4837	34.8392	0.4837
		−0.4832	0.0256	18.2402	0.6523	18.2466	0.6529
	T = 6	0.0324	0.0056	0.4630	0.1737	0.4641	0.1738
		0.0256	0.0044	0.3774	0.1820	0.3782	0.1820
	T = 7	0.0187	0.0156	0.1670	0.1658	0.1681	0.1665
		0.0113	0.0102	0.1628	0.1574	0.1632	0.1577
N = 300	T = 5	−1.2597	−0.0039	27.7644	0.1537	27.7929	0.1537
		1.1968	−0.0030	34.6115	0.1420	34.6322	0.1420
	T = 6	−0.0077	0.0039	0.2846	0.0767	0.2847	0.0768
		0.0116	−0.0004	0.1768	0.0745	0.1772	0.0745
	T = 7	0.0003	0.0000	0.0649	0.0641	0.0649	0.0641
		0.0010	0.0009	0.0677	0.0595	0.0677	0.0595

This table presents a set of simulations with 1,000 replications. The two rows for a given pair of N and T are the values associated with estimators of each of the two coefficients. “SD” and “RMSE” are respectively the standard deviation and root mean squared error of the estimators over all replications for a given experiment.

Despite T > 2K + 1 for each setting, the CCEMG estimator exhibits substantial bias when T = 6, though the QLDMG estimator appears unbiased. The QLDMG outperforms the CCEMG in terms of RMSE for each N and T besides N = 600 and T = 8. We would expect the CCEMG to perform well relative to the QLDMG as T grows due to the incidental parameter problem in the first-stage QLD estimation. However, even for moderately low values of N and large values of T, the QLDMG has optimistic properties.

I include additional simulations in the Appendix that consider weak factors and a failure of identification where the factor loadings all have mean zero. I compare the QLDP, CCEP, and GMM estimators together. All estimators perform reasonably well in the weak factor case and poorly when the loadings are mean zero. The GMM estimators outperform the linear estimators because they do not require the CCE moments for identifying β ₀.

5 Application

I evaluate the effect of education expenditures on student performance using standardized test pass rates as a proxy. The policy variable of interest is average real expenditure per pupil, as it represents the effect of additional expenditure on test scores. Starting in the 1994/1995 school year, the state of Michigan began awarding “foundation grants” that were based on the per-student spending of the school district in the previous year. The goal was to eventually bring schools up to a benchmark “basic foundation” amount that increased over time. The state started by awarding foundation grants to increase expenditure to a minimum $4,200 per student or an additional $250 per student, whichever was higher. By 2000, the minimum and benchmark amounts were equal at $5,700.

Michigan students undertake a battery of standardized tests in elementary, junior, and secondary school. Like Papke (2005) and Papke and Wooldridge (2008), I focus on the fourth-grade math test because it was consistently defined and measured over the sample. While the abrupt nature of the policy change may appear to present a natural experiment setting, it is difficult to compare Michigan’s standardized test pass rates to external states. Standardized tests are only standardized within a state: students from different states are tested on different curricula with different grading methodologies, meaning that passing grades cannot be compared between states. Because traditional causal inference methods like difference-in-differences or synthetic control are not applicable here (since they require commonly measured outcomes), I instead model the conditional mean of a district’s pass rate in terms of educational inputs. However, we may believe that unobserved variables will bias results. For example, school funding came primarily from local property taxes before the fulfillment of the new state policy. It is likely that local tax receipts are affected by macroeconomic shocks at differential rates, depending on district-level industrial makeup and exposure to foreign trade. Because these macroeconomic shocks are difficult to measure precisely, we instead model them using interactive fixed effects and remove them using the methodologies described in this paper.

I consider school district-level data in the state of Michigan over the time periods 1995–2001. There are N = 501 school districts observed for T = 7 school years over 1995–2001.^[8] I present summary statistics and descriptions for the variables of interest.

Variable	Mean	Standard deviation	Description
math4	0.6939	0.1515	Fraction of fourth graders who pass the MEAP math test.
avgrexp	6,385.51	1,034.94	Average real expenditure per pupil.
lunch	0.2886	0.1616	Fraction of students eligible for free and reduced lunch.
enroll	3,112.31	7,965.49	Total enrollment.

The outcome variable, math4, denotes the pass rate for fourth-grade students taking a standardized math test and stands as a measure of student achievement. Expenditures per pupil were averaged over the current year as well as the previous three, meaning average real expenditure per pupil in 1995 is an average of expenditure in 1992, 1993, 1994, and 1995. The equation of interest is

(23) m a t h 4 i t = c i + log ( a v g r e x p i t ) β 1 + l u n c h i t β 2 + log ( e n r o l l i t ) β 3 + f t ′ γ i + e i t

which is similar to Papke (2005) but with the allowance for interactive fixed effects. I collect lunch _it, log(enroll)_it, and log(avgrexp)_it and use the reduced form CCE equation from Assumption 2 to implement the pooled QLD estimator.^[9] This specification allows me to test for the number of factors. I also use the Ahn, Lee, and Schmidt (2013) GMM function to test for p ₀, with and without the CCE equations.

Table 7 provides the p-values for testing the hypothesis H ₀: p ₀ = p versus H ₁: p ₀ > p.

Table 7:

Testing for p ₀.

	p-Values
	RF2	GMM1	GMM2
p ₀ = 0	0.0000	0.0000	0.0000
p ₀ = 1	0.0000	0.0000	0.0000
p ₀ = 2	0.0000	0.4852	0.0000
p ₀ = 3	0.0000	0.1157	0.0000

This table presents the p-values from GMM overidentifying tests (as in Theorem 3.1) using different moment conditions. “RF2” uses only the CCE moment conditions. “GMM1” uses only the Ahn Lee, and Schmidt (2013) moments while “GMM2” combines both sets.

A rejection of the hypothesis suggests more factors than the tested value, and a failure to reject suggests the current value is correct. The titles ‘GMM1’, ‘GMM2’, and ‘RF2’ (for reduced form) refer to the respective objective function used to test the relevant hypothesis. I stress that testing for p ₀ comes from a long-established literature, briefly described in Ahn, Lee, and Schmidt (2013). The only new concept I introduce with respect to this specific specification test is using the reduced form moments E H 0 ′ Z i = 0 .

GMM1 is just the Ahn, Lee, and Schmidt (2013) objective function. GMM2 is the Ahn, Lee, and Schmidt (2013) objective function with the additional moments E H 0 ′ Z i = 0 . Finally, RF is just the reduced form moments E H 0 ′ Z i = 0 . GMM1 suggests that the correct number of factors is p ₀ = 2. GMM2 and RF both reject p ₀ = 2 at any reasonable confidence level, and GMM2 rejects p ₀ = 3, though it uses a much larger set of moments than the other two which may decrease power. It may also suffer from the same global identification problems discussed in Hayakawa (2016), which suggests the GMM1 test will perform better practically. I stop testing at p ₀ = 3 because RF is just identified at p ₀ = 4. Regardless of the tests, the moments E H 0 ′ Z i = 0 only allow me to estimate up to four factors. Even if p ₀ > 4, the QLDP nets more unobserved heterogeneity than TWFE.

I estimate the model’s parameters via TWFE, CCEP, QLDP, and the two GMM estimators. As T = 7 and K = 3, the CCEP estimator can accommodate X ̄ , y ̄ , and a heterogeneous intercept in F ̂ . Further, the pooled QLD estimator is computed with p = K + 1 = 4 after eliminating a heterogeneous intercept from X _i and y _i, unit-by-unit. As such, QLDP is a natural comparison to TWFE. Theorem 3.4 tells us that β ̂ Q L D P is invariant to common variables when p = K + 1 and simulation evidence in the previous section suggests that overestimating p ₀ is not particularly problematic from the perspective of bias or inference. Since it also eliminates a heterogeneous intercept, it should be consistent if TWFE is consistent.

I present results in Table 8 that show estimation after eliminating a heterogeneous intercept. For CCEP, this simply amounts to F ̂ = ( 1 , y ̄ , X ̄ ) . I manually remove a unit fixed effect for QLDP, GMM1, and GMM2. I also cross-sectionally demean the data for GMM1 and GMM2 to control for secular time effects. I set p = 2 for the GMM estimators because there is no algebraic benefit to overestimating p as with the QLDP estimator. Standard errors are in parentheses while p-values are in brackets. I provide the usual cluster-robust standard errors for TWFE. For the other estimators, the reported standard errors are generated via the nonparametric bootstrap, with 5,000 replications for CCE and QLDP. The GMM estimators are reported with 500 replications due to computational cost.

Table 8:

Results.

	TWFE	CCE	QLDP	GMM1	GMM2
lunch	−0.0419	0.0398	−0.1576	0.1519	−0.0096
	(0.0730)	(0.1367)	(0.1637)	(0.0705)	(0.1703)
	[0.5658]	[0.7709]	[0.3381]	[0.0241]	[0.9550]
log(enroll)	0.0021	−0.0592	0.0268	0.0492	−0.1324
	(0.0487)	(0.1497)	(0.2152)	(0.0766)	(0.2067)
	[0.9663]	[0.6924]	[0.8838]	[0.5206]	[0.5219]
log(avgrexp)	0.3771	0.5409	0.8287	0.5661	0.5163
	(0.0704)	(0.2695)	(0.3785)	(0.1900)	(0.2396)
	[0.0000]	[0.0446]	[0.0303]	[0.0029]	[0.0312]

This table presents results for the different estimators of the coefficients in equation (23). Standard errors for the respective estimates are in parentheses. The numbers in brackets are p-values for the test of significance of the respective coefficient estimates. The CCEP and QLDP estimators both explicitly control for a heterogeneous intercept for the sake of comparison with the TWFE estimator. The CCEP estimator also uses the cross-sectional averages of the outcome and regressors as factor proxies. The QLDP first-stage estimates using the within transformed outcome and regressors and sets p = K + 1 = 4. The GMM1 is just the estimator from Ahn Lee, and Schmidt (2013). GMM2 uses these moments along with the additional moments in equation (11). The TWFE standard errors use the usual analytical cluster-robust formula. The CCE and QLDP standard errors come from the nonparametric bootstrap with 5,000 replications. The GMM standard errors also come from the nonparametric bootstrap, but with only 500 replications due to computational cost.

The QLDP estimator suggests substantial estimates for the effect of per student expenditures. A 10 % increase in the average expenditure per student is associated with an 8.3 percentage point increase in the math test pass rate, which is significant at the 5 % level. This estimate is more than twice as large as the TWFE estimate and more than three halves the CCEP estimate. These results suggest that TWFE is not adequately controlling for the heterogeneity present in the data set. Both the CCEP and QLDP estimates are statistically significant at the 5 % level. The TWFE standard errors are generally smaller than CCE and QLD because it removes less variation from the data. Both of the GMM estimators give a similar point estimate to the CCE estimator, but with smaller standard errors. Surprisingly, GMM2 has a larger standard error than GMM1.

I also considered estimation via the mean group QLD and CCE estimators. However, both parameter estimates and standard errors were unreasonable compared to the other estimators. In fact, the p-values were significantly larger than any other reported case and suggested a lack of precision. Recall that the mean group estimators require much stronger exogeneity and identifying conditions than the pooled estimators.

6 Conclusions

This paper considers fixed-T estimation of linear panel data models where the errors have a general unknown factor structure. I use the quasi-long-differencing transformation studied by Ahn, Lee, and Schmidt (2013) to eliminate the factor structure and provide moment conditions for estimation. I study the moments implied by assuming a common correlated effects model. Applying the QLD transformation to the independent variables improves efficiency of estimating the parameters of interest in the main equation, which is information that CCEP does not use. Current proofs of fixed-T asymptotic normality of the CCEP estimator assume loadings that are strictly exogenous with respect to the idiosyncratic errors in the independent variables. I show that the uncorrelated loadings assumptions implies the existence of an even larger number of moments which CCE neglects. I also provide robust standard errors in a more general setting than the CCE models in Pesaran (2006) and Westerlund, Petrova, and Norkute (2019).

I apply the moment-based perspective to a heterogeneous slopes model similar to the original Pesaran (2006) setting. I prove consistency and asymptotic normality of pooled and mean group estimators based on the QLD transformation and put no restrictions on the relationship between T and K, in contrast to CCE. These estimators are shown to outperform CCE estimators in finite samples even when N is small. The pooled QLD estimator also has the desirable property of invariance to common variables, like time trends and macroeconomic indicators, when the estimated number of factors equals the number of regressors. I reexamine estimation of educational expenditures on standardized test performance and find significantly larger effects of educational spending compared to fixed effects regression. These estimates are also reported up to reasonable precision, suggesting that applied researchers may not be adequately controlling for heterogeneity in their analyses.

One important direction for future work concerns the overestimation of p ₀. It is known that CCE is robust to K + 1 > p ₀. Moon and Weidner (2015) prove that principal components estimation is also robust to overestimating the number of factors, provided T is large. However, while there is ample simulation evidence suggesting the robustness of QLD to such a failure, a formal proof is lacking. It would also be useful to investigate the robustness of the QLDP estimators to failure of the reduced form equation in Assumption 2.

Corresponding author: Nicholas L. Brown, Department of Economics, Florida State University, 113 Collegiate Loop, Tallahassee, FL 32304, USA, E-mail: nlb24c@fsu.edu

Acknowledgments

I would like to thank Jeffrey Wooldridge and Peter Schmidt for their guidance and advice. I would also like to thank Ben Zou, Nicole Mason-Wardell, Joakim Westerlund, Seung Ahn, Vasilis Sarafidis, Hashem Pesaran, Zhonghun Qu for his handling of my manuscript, two anonymous referees, and all participants in the MSU Econometrics Seminar Series. All errors are my own.

References

Ahn, Seung C., Young H. Lee, and Peter Schmidt. 2013. “Panel Data Models with Multiple Time-Varying Individual Effects.” Journal of Econometrics 174: 1–14. https://doi.org/10.1016/j.jeconom.2012.12.002.Search in Google Scholar

Breitung, Jörg, and Philipp Hansen. 2021. “Alternative Estimation Approaches for the Factor Augmented Panel Data Model with Small T.” Empirical Economics 60: 327–51. https://doi.org/10.1007/s00181-020-01948-7.Search in Google Scholar

Breitung, Jörg, and Nazarii Salish. 2021. “Estimation of Heterogeneous Panels with Systematic Slope Variations.” Journal of Econometrics 220: 399–415. https://doi.org/10.1016/j.jeconom.2020.04.007.Search in Google Scholar

Brown, Nicholas, and Joakim Westerlund. 2023. “Testing Factors in CCE.” Economics Letters 230: 111245. https://doi.org/10.1016/j.econlet.2023.111245.Search in Google Scholar

Brown, Nicholas, Peter Schmidt, and Jeffrey M. Wooldridge. 2023. “Simple Alternatives to the Common Correlated Effects Model.” Technical Report. https://doi.org/10.13140/RG.2.2.12655.76969/1.Search in Google Scholar

Campello, Murillo, Antonio F. Galvao, and Ted Juhl. 2019. “Testing for Slope Heterogeneity Bias in Panel Data Models.” Journal of Business & Economic Statistics 37: 749–60. https://doi.org/10.1080/07350015.2017.1421545.Search in Google Scholar

Chudik, Alexander, and M. Hashem Pesaran. 2015. “Common Correlated Effects Estimation of Heterogeneous Dynamic Panel Data Models with Weakly Exogenous Regressors.” Journal of Econometrics 188: 393–420. https://doi.org/10.1016/j.jeconom.2015.03.007.Search in Google Scholar

Hansen, Lars Peter. 1982. “Large Sample Properties of Generalized Method of Moments Estimators.” Econometrica 50: 1029–54. https://doi.org/10.2307/1912775.Search in Google Scholar

Hayakawa, Kazuhiko. 2016. “Identification Problem of GMM Estimators for Short Panel Data Models with Interactive Fixed Effects.” Economics Letters 139: 22–6. https://doi.org/10.1016/j.econlet.2015.12.012.Search in Google Scholar

Juhl, Ted, and Oleksandr Lugovskyy. 2014. “A Test for Slope Heterogeneity in Fixed Effects Models.” Econometric Reviews 33: 906–35. https://doi.org/10.1080/07474938.2013.806708.Search in Google Scholar

Juodis, Artūras, and Vasilis Sarafidis. 2018. “Fixed T Dynamic Panel Data Estimators with Multifactor Errors.” Econometric Reviews 37: 893–929. https://doi.org/10.1080/00927872.2016.1178875.Search in Google Scholar

Juodis, Artūras, and Vasilis Sarafidis. 2020. “A Linear Estimator for Factor-Augmented Fixed-T Panels with Endogenous Regressors.” Journal of Business & Economic Statistics 40 (1): 1–15, https://doi.org/10.1080/07350015.2020.1766469.Search in Google Scholar

Karabiyik, Hande, Simon Reese, and Joakim Westerlund. 2017. “On the Role of the Rank Condition in CCE Estimation of Factor-Augmented Panel Regressions.” Journal of Econometrics 197 (1): 60–4. https://doi.org/10.1016/j.jeconom.2016.10.006.Search in Google Scholar

Margaritella, Luca, and Joakim Westerlund. 2023. “Using Information Criteria to Select Averages in CCE.” The Econometrics Journal utad009 26 (3): 405–421, https://doi.org/10.1093/ectj/utad009.Search in Google Scholar

Moon, Hyungsik Roger, and Martin Weidner. 2015. “Linear Regression for Panel with Unknown Number of Factors as Interactive Fixed Effects.” Econometrica 83: 1543–79. https://doi.org/10.3982/ecta9382.Search in Google Scholar

Neal, Timothy. 2015. “Estimating Heterogeneous Coefficients in Panel Data Models with Endogenous Regressors and Common Factors.” Technical Report.Search in Google Scholar

Norkutė, Milda, Vasilis Sarafidis, Takashi Yamagata, and Guowei Cui. 2021. “Instrumental Variable Estimation of Dynamic Linear Panel Data Models with Defactored Regressors and a Multifactor Error Structure.” Journal of Econometrics 220: 416–46. https://doi.org/10.1016/j.jeconom.2020.04.008.Search in Google Scholar

Papke, Leslie E. 2005. “The Effects of Spending on Test Pass Rates: Evidence from Michigan.” Journal of Public Economics 89: 821–39. https://doi.org/10.1016/j.jpubeco.2004.05.008.Search in Google Scholar

Papke, Leslie E., and Jeffrey M. Wooldridge. 2008. “Panel Data Methods for Fractional Response Variables with an Application to Test Pass Rates.” Journal of Econometrics 145: 121–33. https://doi.org/10.1016/j.jeconom.2008.05.009.Search in Google Scholar

Pesaran, M. Hashem. 2006. “Estimation and Inference in Large Heterogeneous Panels with a Multifactor Error Structure.” Econometrica 74: 967–1012. https://doi.org/10.1111/j.1468-0262.2006.00692.x.Search in Google Scholar

Vos Ignace De, and Gerdie Everaert. 2021. “Bias-Corrected Common Correlated Effects Pooled Estimation in Dynamic Panels.” Journal of Business & Economic Statistics 39: 294–306. https://doi.org/10.1080/07350015.2019.1654879.Search in Google Scholar

Westerlund, Joakim. 2019. “On Estimation and Inference in Heterogeneous Panel Regressions with Interactive Effects.” Journal of Time Series Analysis 40: 852–7. https://doi.org/10.1111/jtsa.12432.Search in Google Scholar

Westerlund, Joakim, and Yousef Kaddoura. 2022. “CCE in Heterogenous Fixed-T Panels.” The Econometrics Journal 25 (3): 719–738, https://doi.org/10.1093/ectj/utac012.Search in Google Scholar

Westerlund, Joakim, Yana Petrova, and Milda Norkute. 2019. “CCE in Fixed-T Panels.” Journal of Applied Econometrics 34: 746–61. https://doi.org/10.1002/jae.2707.Search in Google Scholar

Wooldridge, Jeffrey M. 2005. “Fixed-Effects and Related Estimators for Correlated Random-Coefficient and Treatment-Effect Panel Data Models.” Source: The Review of Economics and Statistics 87: 385–90. https://doi.org/10.1162/0034653053970320.Search in Google Scholar

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/jem-2023-0050).

Received: 2023-11-20

Accepted: 2024-09-06

Published Online: 2024-10-16

This work is licensed under the Creative Commons Attribution 4.0 International License.

Supplementary Material Details

Articles in the same Issue

https://doi.org/10.1515/jem-2023-0050

Keywords for this article

factor models; common correlated effects; quasi-long-differencing; fixed-T; correlated random coefficients

Creative Commons

BY 4.0