Shrinkage Estimation and Forecasting in Dynamic Regression Models Under Structural Instability

Ali Mehrabani; Shahnaz Parsaeian; Aman Ullah

doi:10.1515/jem-2023-0036

Article Publicly Available

Shrinkage Estimation and Forecasting in Dynamic Regression Models Under Structural Instability

Ali Mehrabani , Shahnaz Parsaeian and Aman Ullah

Published/Copyright: June 25, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Econometric Methods Volume 13 Issue 2

Abstract

This paper introduces a Stein-like shrinkage method for estimating slope coefficients and forecasting in first order dynamic regression models under structural breaks. The model allows for unit root and non-stationary regressors. The proposed shrinkage estimator is a weighted average of a restricted estimator that ignores the break in the slope coefficients, and an unrestricted estimator that uses the observations within each regime. The restricted estimator is the most efficient estimator but inconsistent when there is a break. However, the unrestricted estimator is consistent but not efficient. Therefore, the proposed shrinkage estimator balances the trade-off between the bias and variance efficiency of the restricted estimator. The averaging weight is proportional to the weighted distance of the restricted estimator, and the unrestricted estimator. We derive the analytical large-sample approximation of the bias, mean squared error, and risk for the shrinkage estimator, the unrestricted estimator, and the restricted estimator. We show that the risk of the shrinkage estimator is lower than the risk of the unrestricted estimator under any break size and break points. Moreover, we extend the results for the model with a unit root and non-stationary regressors. We evaluate the finite sample performance of our proposed method via extensive simulation study, and empirically in forecasting output growth.

Keywords: ARX-model; asymptotic approximation; forecasting; non-stationary regressors; structural breaks; unit root

JEL Classification: C13; C22; C53

1 Introduction

A sizeable strand of the literature in economics and finance focuses on autoregressive (AR) models. These models are heavily used in forecasting economic and financial variables and are frequently considered as benchmarks in forecast competitions, as they are difficult to beat. Nevertheless, since many time series data in economics and finance are characterized by parameter instability, which is now widely recognized as an important source of forecast failure as recorded by Stock and Watson (1996), Hansen (2001), Giacomini and Rossi (2009), Rossi and Sekhposyan (2010), Inoue and Rossi (2011), Clements and Hendry (2006, 2011, and Rossi (2013) inter alia, there is an increasing evidence that parameters of AR models in many economic and financial time series are unstable and subject to structural breaks. For example, Mankiw and Miron (1986) and Mankiw, Miron, and Weil (1987) considered AR(1) models and find parameter instability in the short-term interest rate. Phillips, Wu, and Yu (2011) find parameter instability for 1990s NASDAQ stock prices. See also Garcia and Perron (1996) and Stock and Watson (1996) who document instability related to the autoregressive terms in a wide variety of economic time series. This suggests a need to study the forecasting performance of AR models when they are subject to structural breaks.

This paper considers an ARX model that contains structural breaks, and proposes a Stein-like shrinkage estimator that exploits observations in neighboring regimes.^[1] Our proposed estimator is a weighted average of a restricted estimator (which uses full-sample of observations under the restriction of no breaks in the parameters) with an unrestricted estimator (which uses observations within each regime). The restricted estimator is inconsistent when there is a break while it is the most efficient. On the other hand, the unrestricted estimator is consistent but the consistency comes at the cost of losing efficiency. Hence, the proposed shrinkage estimator trades off an increased bias introduced by the restricted estimator against a reduction in error variance resulting from using a full-sample of observations. The averaging weight depends on the weighted distance of the restricted and unrestricted estimators, which is similar to the James-Stein weight, cf. Stein (1956) and James and Stein (1961).^[2] Therefore, it assigns appropriate weights to each of the two estimators by measuring the magnitude of the structural break. We derive the analytical large-sample approximation of the bias, mean squared error (MSE) and risk for our proposed shrinkage estimator, the unrestricted estimator, and the restricted estimator.^[3] We derive the condition under which the risk of the shrinkage estimator is lower than the risk of the unrestricted estimator under any break size and break points. We also show how the method can be used in out-of-sample forecasting. Furthermore, we extend the results to the model with unit root and non-stationary regressors.

To the best of our knowledge, this is the first paper that analytically derives the asymptotic approximation of the bias, and risk up to order T ⁻¹, and MSE up to order T ⁻² in ARX models under structural breaks, where T is the total number of observations. Furthermore, we provide new results to the model with unit root and non-stationary regressors under structural breaks, which have not been considered before in the literature. Because of these considerations, we would like to point out that our results differ from the recent work by Lee, Parsaeian, and Ullah (2022a, 2022b, 2022c) who consider different types of combined estimators to improve forecasts under structural breaks in a static stationary time series model without lagged dependent variable and using different weights. For example, in Lee, Parsaeian, and Ullah (2022a) the combination weight is different and set to be a constant between zero and one, Lee, Parsaeian, and Ullah (2022b) consider Stein-like combined estimator, however, the optimal weights in this paper are different due to the presence of lagged dependent variable and theoretical frameworks. Lee, Parsaeian, and Ullah (2022c) consider another type of combined estimators which assigns a full weight of one to the post-break sample observations and a weight between zero and one to the pre-break sample observations. In addition, there are three additional main differences that have not been discussed in the previous works by Lee, Parsaeian, and Ullah (2022a, 2022b, 2022c). First, this paper considers a dynamic ARX model and allows for unit roots (or integrated of higher order) and non-stationary regressors. Second, in this paper the dominance property of the proposed Stein-like shrinkage estimator holds for any fixed deviations from the restrictions. This complements the “local asymptotic” argument discussed in Lee, Parsaeian, and Ullah (2022b). Third, we derive the analytical large sample approximation, using Nagar (1959) method, of the bias and risk up to order T ⁻¹, and MSE up to order T ⁻² for the proposed shrinkage estimator, the unrestricted estimator, and the restricted estimator under structural breaks. This allows us to theoretically and numerically compare the performance of these estimators. Because of the presence of the lagged dependent variable in the model considered in this paper, the theoretical results presented here are noticeably different than those in Lee, Parsaeian, and Ullah (2022a, 2022b, 2022c). Even in the special case of no lagged variable, our results are different with them because the combined estimator in Lee, Parsaeian, and Ullah (2022a) has different weights, and in Lee, Parsaeian, and Ullah (2022b, 2022c) they either derive only local asymptotic theory results instead of large sample approximation results and/or their weights in combined estimator are different.

We present an extensive Monte Carlo simulation study to evaluate the finite sample forecasting performance of the proposed shrinkage method. The results support our theoretical findings, and show the outperformance of our shrinkage estimator relative to the unrestricted estimator in finite sample. In particular, our numerical results show the benefits of exploiting the neighboring observations in estimation and forecasting relative to using only the observations within each regime. Furthermore, we undertake an empirical analysis for forecasting the output growth using 131 macroeconomic and financial time series to compare the forecasting performance of the shrinkage estimator with the unrestricted estimator, the restricted estimator, and a range of alternative methods existing in the literature. The empirical results suggest the out-performance of our method relative to the alternative methods, in the sense of mean squared forecast errors (MSFE).

Analysis of AR models subject to structural breaks have been also considered by Clements and Hendry (1998, 1999 and Pesaran and Timmermann (2005), but they focus on the analysis of forecast errors decomposition, while the focus of this paper is developing an estimator to deal with the parameter instability. Specifically, Clements and Hendry (1998, 1999 analyze the forecast errors from AR models subject to structural change by assuming that the parameters of the AR models remain unchanged during the estimation period. Pesaran and Timmermann (2005) consider the small sample properties of forecasts from AR models estimated from windows of different sizes. See also Banerjee and Urga (2006) for a comprehensive review of developments in the fields of modelling structural breaks.

The analysis of approximating the moments of estimators in autoregressive models dates back to Bartlett (1946), who finds a first-order variance approximation in an autoregressive gaussian process (see also Hurwicz 1950). White (1961) and Shenton and Johnson (1965) give approximations of the first two moments in the AR(1) model. Kendall (1954) and Marriott and Pope (1954) consider AR(1) models with intercepts, and find the approximate bias of the least-squares estimator of the lagged dependent variable coefficient. Recently, a number of papers have studied the small sample bias of the ordinary least-squares estimator in single dynamic regression models, see for example Grubb and Symons (1987) and Kiviet and Phillips (1993). Kiviet and Phillips (1994) extend the analysis of Kiviet and Phillips (1993) to the higher-order dynamic regression models. More recently, Kiviet and Phillips (2012) found the higher-order approximate bias of the least-squares estimator of the slope coefficients in a stable ARX(1) model. Kiviet and Phillips (2005) extend these results to non-stable ARX(1) models, and examine the moments of the least-squares estimator in the single normal ARX(1) model with an arbitrary number of exogenous regressors when the true coefficient of the lagged-dependent variable is unity.

The paper is organized as follows. Section 2 describes the model. For simplicity, we discuss the problem under a single break, which simplifies the essential idea without complicating notation. However, the generalization of the method for the multiple breaks is straightforward. In Section 3, we introduce the estimators. We give the bias, MSE, and the risk of the Stein-like shrinkage estimator using large-sample approximations in Section 4, while those of the unrestricted estimator are provided in the Supplementary Online Appendix B. We extend the results of Section 4 to a model where the slope coefficient of the lagged dependent variable is unity in Section 5. Monte Carlo results are given in Section 6. Results from our empirical example are given in Section 7. Conclusions are given is Section 8. Proofs and detailed calculations are provided in Appendix A.

Notation: Throughout the paper we adopt the following notation. I _p and 0_p×q denote the p × p identity matrix and p × q matrix of zeros, respectively. tr(⋅) denotes trace, ⊗ denotes Kronecker products, and 1 ( ⋅ ) denotes the indicator function. For an m × n real matrix A = (a _ij), we write the transpose A′. We write A = O(b) when the non-zero elements of A are of order b, i.e. a _ij = O(b). For a stochastic matrix A, we write A = O _p(b) when the non-zero elements of A are of order b, i.e. a _ij = O _p(b).

2 The Model

Consider the following first-order linear dynamic regression model (ARX(1) model)^[4] defined over the period t = 1, 2, …, T, which is subject to a single structural break at time T ₁

(2.1) y t = λ 1 y t − 1 + x t ′ β 1 + u t , for t ≤ T 1 λ 2 y t − 1 + x t ′ β 2 + u t , for t > T 1 ,

where y _t is the dependent variable, x _t = (x _t,1, …, x _t,k)′ is a k × 1 vector of exogenous regressors, and u _t is the unobserved error term.^[5] The (k + 1) × 1 vectors of the pre-break and post-break slope coefficients are denoted by α 1 = λ 1 , β 1 ′ ′ , and α 2 = λ 2 , β 2 ′ ′ , respectively, which are the parameters of interest. In a matrix form, the model can be expressed as

(2.2) y = Z α + u ,

where y = (y ⁽¹⁾′, y ⁽²⁾′)′ is a T × 1 vector of the dependent variables, y ( 1 ) = ( y 1 , … , y T 1 ) ′ , y ( 2 ) = ( y T 1 + 1 , … , y T ) ′ . The T × 2(k + 1) matrix of observations on the regressors is denoted by Z = diag(Z ₁, Z ₂), where Z i = ( y − 1 ( i ) , X i ) for i = 1, 2, y − 1 ( 1 ) = ( y 0 , … , y T 1 − 1 ) ′ , y − 1 ( 2 ) = ( y T 1 , … , y T − 1 ) ′ , X 1 = ( x 1 , … , x T 1 ) ′ , and X 2 = ( x T 1 + 1 , … , x T ) ′ . u = (u ₁, …, u _T)′ is a T × 1 vector of disturbances, and α = α 1 ′ , α 2 ′ ′ is a 2(k + 1) × 1 vector of the unknown slope coefficients.

Assumption 1.

(i) |λ _i| < 1, for i = 1, 2; (ii) the matrix Z is such that Z′Z = O _p(T); (iii) the T × 2(k + 1) matrix Z has rank 2(k + 1) with probability one; (iv) the regressors in X are strongly exogenous; (v) the disturbances follow u ∼ N(0, Ω_u), where Ω u = diag σ 1 2 I T 1 , σ 2 2 I T − T 1 with 0 < σ i 2 < ∞ ; (vi) the start-up value follows y 0 = y ̄ 0 + d u 0 ∼ N y ̄ 0 , d 2 σ 1 2 , with 0 ≤ d < ∞; (vii) y ₀ and u are mutually independent.

We note that u ₀ is the start-up error term, i.e. the error term at time 0. d = 0 represents the fixed start-up, and if d ≠ 0 the start-up is random. Assumption 1(ii) excludes non-stationary regressors including deterministic or stochastic trends, but the presence of such variables will not change the approximation bias, MSE, MSFE, and risk formulas in Section 4. Their inclusion only reduces the order of magnitude of the moments and the order of their remainder terms (for a similar discussion see Kiviet and Phillips 2012). We demonstrate in Section 5 that our results are still applicable under a unit root (relaxing Assumption 1(i)) and non-stationary regressors (relaxing Assumption 1(ii)). Assumption 1(v) assumes that the error terms are normally distributed and excludes serial correlation in the errors. This assumption can be relaxed at the expense of extra terms to be added in the bias and MSE of the estimators. However, since the model in (2.1) has a dynamic structure, it captures much of the serial correlation.

Remark 1.

A main difference between the model considered in (2.1) and those considered in Lee, Parsaeian, and Ullah (2022a, 2022b, 2022c) is that the model considered here is a dynamic ARX, and allows for a unit root (or higher order integration), and non-stationary regressors.

We analyze the model conditional on the observed matrix X in Section 4. Therefore, in order to distinguish the fixed and zero mean stochastic elements of the regressor matrix Z _i, i = 1, 2, we decompose Z i = Z ̄ i + Z ̃ i , where Z ̄ i is defined as the expectation of Z _i conditional on X and y ̄ 0 . For notational simplicity, we denote E ( ⋅ ) ≡ E ( ⋅ | X , y ̄ 0 ) . Hence,

(2.3) Z ̄ i = E ( Z i ) = E ( y − 1 ( i ) ) , X i = y ̄ − 1 ( i ) , X i , for i = 1,2 ,

and

(2.4) Z ̃ i = Z i − Z ̄ i = y ̃ − 1 ( i ) , 0 T i × k = y ̃ − 1 ( i ) ς 1 ′ , for i = 1,2 ,

where ς ₁ = (1, 0, …, 0)′ is a vector of (k + 1) × 1, and T ₂ = T − T ₁ is the number of post-break sample observations. Define the T _i × T matrix Λ_i = (F _i, Δ_i) for i = 1, 2. Then, we define y ̄ − 1 ( i ) = Λ i y ̄ − 1 * , where y ̄ − 1 * = y ̄ 0 , x 1 ′ β 1 , … , x T 1 ′ β 1 , x T 1 + 1 ′ β 2 , … , x T − 1 ′ β 2 ′ , F 1 = 1 , λ 1 , … , λ 1 T 1 − 1 ′ is T ₁ × 1, F 2 = λ 1 T 1 , λ 1 T 1 λ 2 , … , λ 1 T 1 λ 2 T 2 − 1 ′ is T ₂ × 1, and

Δ 1 = 0 0 … … … 0 1 0 … … … 0 λ 1 1 … … … 0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ λ 1 T 1 − 2 λ 1 T 1 − 3 … λ 1 1 0 … 0 ,

Δ 2 = λ 1 T 1 − 1 … λ 1 1 0 0 … … 0 λ 1 T 1 − 1 λ 2 … λ 1 λ 2 λ 2 1 0 … … 0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ λ 1 T 1 − 1 λ 2 T 2 − 1 … λ 1 λ 2 T 2 − 1 λ 2 T 2 − 1 λ 2 T 2 − 2 … … λ 2 1 ,

are T ₁ × (T − 1) and T ₂ × (T − 1), respectively. Moreover, define the (T + 1) × 1 random vector ν = u 0 , u ′ ′ = ( u 0 , u 1 , … , u T ) ′ ∼ N ( 0 , Ω ν ) , where Ω ν = diag σ 1 2 I T 1 + 1 , σ 2 2 I T 2 , and the T × (T + 1) matrix G = G 1 ′ , G 2 ′ ′ , with the T _i × (T + 1) matrix G _i = (dF _i, C _i), and C i = ( Δ i , 0 T i × 1 ) , for i = 1, 2. Then, it can be easily verified that Z ̃ i = G i ν ς 1 ′ for i = 1, 2. Therefore, we have Z = Z ̄ + Z ̃ , where Z ̄ = diag ( Z ̄ 1 , Z ̄ 2 ) , and Z ̃ = diag ( Z ̃ 1 , Z ̃ 2 ) = ∑ i = 1 2 L i G ν e 1 , i ′ , where the 2(k + 1) × 1 vectors e 1,1 = ( ς 1 ′ , 0 1 × ( k + 1 ) ) ′ and e 1,2 = ( 0 1 × ( k + 1 ) , ς 1 ′ ) ′ , and L 1 = I T 1 0 T 1 × T 2 0 T 2 × T 1 0 T 2 × T 2 , L 2 = 0 T 1 × T 1 0 T 1 × T 2 0 T 2 × T 1 I T 2 , are T × T selection matrices.

Remark 2.

In our analysis below, we assume that the break point is known. In practice, one has to estimate the break point from data. There are several methods in the literature that can be used to estimate the break point, e.g. Bai and Perron (1998, 2003, who also show that the estimated break fraction converges to its true value at rate T or lower, which is sufficient to establish the consistency of the estimators. In the simulation study and the empirical example in Sections 6 and 7, initially we have estimated the break point, and then applied our method.^[6]

3 Estimation

Our goal is to estimate the vector of slope parameters, α, in Equation (2.2). We consider three estimators of the slope parameters: (i) an unrestricted least squares estimator that estimates the slope parameters of each regime only using the observations within the same regime, (ii) a restricted estimator that shrinks the unrestricted estimator towards a restricted parameters space which ignores the break in the slope coefficients, and (iii) a Stein-like shrinkage estimator which is a weighted average of the restricted estimator and the unrestricted estimator, and the averaging weight is proportional to a weighted quadratic loss function.

3.1 Unrestricted Estimator

If we allow for the structural break in the slope coefficients, the standard estimator of α is an unrestricted least-squares estimator. The unrestricted estimator, denoted by α ̂ , is defined as

(3.1) α ̂ = ( Z ′ Z ) − 1 Z ′ y = α + ( Z ′ Z ) − 1 Z ′ u ,

or equivalently,

(3.2) α ̂ = α ̂ 1 α ̂ 2 = α 1 α 2 + Z 1 ′ Z 1 − 1 Z 1 ′ u 1 Z 2 ′ Z 2 − 1 Z 2 ′ u 2 .

3.2 Restricted Estimator

Because of a belief that the break in slope parameters may be small, it can be assumed that the parameter values may be close to a restricted parameter space Ξ 0 = { α ∈ R 2 ( k + 1 ) : r ( α ) = 0 } , where r ( α ) = R α : R 2 ( k + 1 ) → R k + 1 . Thus, we shrink α ̂ towards the restriction space Ξ₀. For example, a restriction matrix R that considers no break in all slope parameters is R = (I _(k+1), − I _(k+1)).^[7] Thus, the restricted least-squares estimator is obtained as the solution to the following minimization

(3.3) Minimize s.t. α ( y − Z α ) ′ ( y − Z α ) subject to r ( α ) = 0 .

Therefore, the restricted least-squares estimator, denoted by α ̃ , can be formulated as

(3.4) α ̃ = α ̂ − ( Z ′ Z ) − 1 R ′ R ( Z ′ Z ) − 1 R ′ − 1 R α ̂ .

3.3 Stein-Like Shrinkage Estimator

We use the restricted estimator and the unrestricted estimator to construct a Stein-like shrinkage estimator. Then, we show that the proposed Stein-like shrinkage estimator improves efficiency. The improved efficiency is a result of making an appropriate trade-off between the bias due to possible incorrect restrictions and variance efficiency gains from imposing the restrictions.

Our proposed Stein-like shrinkage estimator of the slope coefficients, denoted by α ̆ , is a weighted average of the unrestricted estimator and the restricted estimator defined as

(3.5) α ̆ = ω α ̂ + ( 1 − ω ) α ̃ ,

where the weight takes the form

(3.6) ω = 1 − τ D ( α ̂ , α ̃ ) , D ( α ̂ , α ̃ ) = ( α ̂ − α ̃ ) ′ Z ′ Z ( α ̂ − α ̃ ) ,

where D ( α ̂ , α ̃ ) measures the weighted distance between α ̂ and α ̃ , and τ is a positive shrinkage parameter that controls the degree of shrinkage. We will defer describing the optimal choice for shrinkage parameter in Section 4. One may consider the positive part version of the weight ( ω ) + = ω 1 ( ω ≥ 0 ) which ensures the weight is bounded between zero and one. Let α ̆ + denotes the Stein-like shrinkage estimator with the positive part weight. Then as shown by Hansen (2016), the Risk of α ̆ + is strictly smaller than that of α ̆ . Hence, in the Monte Carlo simulations and the empirical study of the paper in Sections 6 and 7, we use the positive part version of the weight.

The shrinkage estimator defined above shrinks the unrestricted estimator towards the restricted estimator by the ratio τ / D ( α ̂ , α ̃ ) . When the difference between these two estimators is small ( D ( α ̂ , α ̃ ) is small, and (1 − ω) is large), the shrinkage estimator gives a larger weight to the restricted estimator, as it is the most efficient estimator. However, when the difference between the two estimators is substantial or large ( D ( α ̂ , α ̃ ) > τ ) , the bias of the restricted estimator could be more than its variance efficiency gain. Thus, the shrinkage estimator becomes a weighted average of the restricted estimator and the unrestricted estimator, while giving a larger weight to the unrestricted estimator. Therefore, the shrinkage estimator prevails regardless of the break size.^[8] We also show this theoretically in the following section.

4 Large-Sample Approximation of the Shrinkage Estimator

We employ the large-sample approximation method developed by Nagar (1959), to analyze the bias, MSE, and risks of the shrinkage estimator (conditional on X) under Assumption 1. To find moment approximations using the Nagar approach, we begin by expressing the estimation error in term of stochastic components which are of decreasing order of magnitude in terms of the sample size.^[9]

Theorem 1.

Under Assumption 1, the bias of the Stein-like shrinkage estimator up to order T ⁻¹ is

(4.1) Bias ( α ̆ ) = E ( α ̆ − α ) = Θ − τ ϕ P ̄ α ,

where ϕ = α ′ P ̄ ′ Q − 1 P ̄ α = O ( T ) , P ̄ = Q R ′ ( R Q R ′ ) − 1 R = O ( 1 ) , Q = [ E ( Z ′ Z ) ] − 1 = O ( T − 1 ) , and the first term above is the bias of the unrestricted estimator up to order O(T ⁻¹) which is Θ = − Q ∑ i = 1 2 Z ̄ ′ L i C Ω u Z ̄ Q e 1 , i + e 1 , i tr Q Z ̄ ′ L i C Ω u Z ̄ + 2 e 1 , i e 1 , i ′ Q e 1 , i tr L i G Ω ν G ′ L i C Ω u , C = C 1 ′ , C 2 ′ ′ , and the MSE of the Stein-like shrinkage estimator up to order T ⁻² is

(4.2) MSE ( α ̆ ) = E ( α ̆ − α ) ( α ̆ − α ) ′ = MSE ( α ̂ ) + τ 2 ϕ 2 P ̄ α α ′ P ̄ ′ − τ ϕ P ̄ Q Σ + Σ Q P ̄ ′ + Θ α ′ P ′ + P ̄ α Θ ′ + Ψ + Ψ ′ + 2 τ ϕ 2 P ̄ α α ′ P ̄ ′ Q − 1 P ̄ Q Σ + Ψ + Σ Q P ̄ ′ + Ψ ′ Q − 1 P ̄ α α ′ P ̄ ′ + τ ϕ 2 P ̄ α α ′ P ̄ ′ Φ + Φ ′ P ̄ α α ′ P ̄ ′ ,

where Ψ = ( P ̄ − I ) Q Φ , Φ = ∑ i = 1 2 e 1 , i ′ P ̄ α Z ̄ ′ L i C Ω u Z ̄ Q + ∑ i = 1 2 e 1 , i α ′ P ̄ ′ Z ̄ ′ L i C Ω u Z ̄ Q + 2 ∑ i = 1 2 ∑ j = 1 2 σ j 2 e 1 , i e 1 , j ′ Q tr G Ω ν G ′ L i C L j e 1 , i ′ P ̄ α , and MSE ( α ̂ ) is given in Equation (B.20) in the Supplementary Online Appendix B.

Further, for any fixed symmetric positive definite weight matrix W of order O(T), the risk of the Stein-like shrinkage estimator up to order T ⁻¹ is

(4.3) Risk ( α ̆ ) = E ( α ̆ − α ) ′ W ( α ̆ − α ) = Risk ( α ̂ ) + τ 2 ϕ 2 α ′ P ̄ ′ W P ̄ α − 2 τ ϕ tr ( W P ̄ Q Σ ) + α ′ P ̄ ′ W Θ + tr ( W Ψ ) + 2 τ ϕ 2 2 α ′ P ̄ ′ Q − 1 P ̄ Q Σ W P ̄ α + 2 α ′ P ̄ ′ Q − 1 Ψ W P ̄ α + α ′ P ̄ ′ Φ W P ̄ α ,

where Σ = diag σ 1 2 I k + 1 , σ 2 2 I k + 1 , and Risk ( α ̂ ) = tr ( MSE ( α ̂ ) W ) .

Proof. See Appendix A, page 29.

Theorem 1 gives the bias, MSE, and Risk of the Stein-like shrinkage estimator. The bias and MSE of the unrestricted estimator is given in Lemma B.3 in the Supplementary Online Appendix. We note that the risk of the Stein-like shrinkage estimator is generalized to allow for any positive definite weight matrix, W, of order O(T). Two arbitrary choices of W are TI _2(k+1), and W _f defined in Section 4.1, where the former provides an unweighted mean squared error, and the latter gives the mean squared forecast error studied in the next section.

Remark 3.

The large sample approximations of the bias and MSE of the unrestricted estimator are provided in the Supplementary Online Appendix B. To the best of our knowledge, this is the first paper that provides these theoretical results under structural breaks.^[10]

In the following Corollary we show the dominance conditions of the Stein-like shrinkage estimator relative to the unrestricted estimator. Let ϕ W = α ′ P ̄ ′ W P ̄ α , μ = tr ( W P ̄ Q Σ ) + α ′ P ̄ ′ W Θ + tr ( W Ψ ) , and η = 2 α ′ P ̄ ′ Q − 1 P ̄ Q Σ W P ̄ α + 2 α ′ P ̄ ′ Q − 1 Ψ W P ̄ α + α ′ P ̄ ′ Φ W P ̄ α .

Corollary 1.1.

Under Assumption 1, if μ > 1 ϕ η , and 0 < τ ≤ 2 ϕ ϕ W μ − 1 ϕ η , then the risk of the Stein-like shrinkage estimator up to order T ⁻¹ is

(4.4) Risk ( α ̆ ) < Risk ( α ̂ ) .

In addition, the optimal shrinkage parameter, denoted by τ _opt, that minimizes the risk of the Stein-like shrinkage estimator up to order T ⁻¹, is

(4.5) τ opt = ϕ ϕ W μ − 1 ϕ η .

Therefore, the risk of the optimal Stein-like shrinkage estimator up to order T ⁻¹ is^[11]

(4.6) Risk ( α ̆ opt ) = Risk ( α ̂ ) − 1 ϕ W μ − 1 ϕ η 2 .

Proof. See Appendix A, page 33.

Corollary 1.1 shows that the proposed shrinkage estimator dominates the unrestricted estimator in terms of having a smaller risk when the shrinkage parameter satisfies the condition 0 < τ ≤ 2 ϕ ϕ W μ − 1 ϕ η . In addition, as the choice of the shrinkage parameter is user-specified, its optimal value and our ideal choice that minimizes the risk of the Stein-like shrinkage estimator up to O(T ⁻¹), is τ _opt. This implies that the Stein-like shrinkage estimator always performs better or performs equal to the unrestricted estimator which is one of the common methods of estimating the coefficients under structural breaks.^[12]

Since τ _opt depends on σ 1 2 , σ 2 2 , Q , Z ̄ , C , G , ω , α , and y ̄ 0 , which are unobserved, we consider the estimated optimal shrinkage parameter, denoted by τ ̂ opt , defined as

(4.7) τ ̂ opt = ϕ ̂ ϕ ̂ W μ ̂ − 1 ϕ ̂ η ̂ + .

In Equation (4.7), μ ̂ , η ̂ , ϕ ̂ and ϕ ̂ W correspond to μ, η, ϕ and ϕ _W respectively, after replacing σ i 2 , α , Q , Z ̄ , C , G with their estimates, denoted by σ ̂ i 2 , α ̂ , Q ̂ , Z ̂ , C ̂ , G ̂ , where α ̂ and σ ̂ i 2 are the unrestricted estimators, Q ̂ = ( Z ′ Z ) − 1 , Z ̂ = diag ( Z ̂ 1 , Z ̂ 2 ) , with Z ̂ i = ( F ̂ i y 0 + C ̂ i X i β ̂ i , X i ) , C ̂ and F ̂ correspond to C = C 1 ′ , C 2 ′ ′ and F = F 1 ′ , F 2 ′ ′ after replacing λ ₁, and λ ₂ by λ ̂ 1 , and λ ̂ 2 . Similarly, Θ ̂ , Ψ ̂ , and P ̂ correspond to Θ, Ψ, and P ̄ after replacing the unobserved parameters with their estimates. Since we define the Stein-like shrinkage estimator with positive shrinkage parameter in (3.5), we set τ ̂ opt to zero when the condition μ ̂ > 1 ϕ ̂ η ̂ does not hold. In this case, the Stein-like shrinkage estimator assigns a weight one to the unrestricted estimator and a zero weight to the restricted estimator.

In the following corollary, we show that the estimated optimal shrinkage parameter, τ ̂ opt , is an unbiased estimator of the infeasible optimal shrinkage parameter τ _opt up to order T ⁻¹. Hence, when the sample size is large enough the risk of the Stein-like shrinkage estimator using the estimated optimal shrinkage parameter is smaller than the risk of the unrestricted estimator.

Corollary 1.2.

Under Assumption 1, if μ > 1 ϕ η , we have

(4.8) E ( τ ̂ opt ) = τ opt + O ( T − 1 ) .

Proof. See Appendix A, page 33.

Remark 4.

The risk of the Stein-like shrinkage estimator presented in Theorem 1 is noticeably different from the risk of the proposed estimators derived in Lee, Parsaeian, and Ullah (2022a, 2022b, 2022c). In this paper, we derive the large sample approximation of risk for any break size, while Lee, Parsaeian, and Ullah (2022b) derive the asymptotic risk for their proposed combined estimator under local-to-zero asymptotic framework. Also, we consider a dynamic regression model, while the model considered in Lee, Parsaeian, and Ullah (2022a, 2022c) is a static stationary model. Furthermore, we allow for non-stationary regressors and unit roots (discussed in Section 5) which have not been considered in Lee, Parsaeian, and Ullah (2022a, 2022b, 2022c).

4.1 Forecasting Under Structural Instability

In this section, we explain how the proposed shrinkage estimator can be used for the out-of-sample forecasting. The true parameters that enter the forecasting period are the coefficients in the most recent regime. Thus, we define a selection matrix S = [0_(k+1)×(k+1), I _k+1] to select the post-break slope parameters. By pre-multiplying the selection matrix to the shrinkage estimator, we get S α ̆ = ω S α ̂ + ( 1 − ω ) S α ̃ , where for example α ̆ 2 = S α ̆ denotes the estimated post-break slope parameters of the shrinkage estimator.

We define the one-step-ahead mean squared forecast error (MSFE) of the shrinkage estimator as

(4.9) MSFE ( α ̆ 2 ) = E ( α ̆ − α ) ′ W f ( α ̆ − α ) ,

where W f = T S ′ z 2 , T + 1 ′ z 2 , T + 1 S , where z 2 , T + 1 = y T , x T + 1 ′ . Since W _f contains lagged dependent variables and is random, we give the MSFE in the following theorem.

Theorem 2.

Under Assumption 1, the MSFE of the Stein-like shrinkage estimator up to order T ⁻¹ is

(4.10) MSFE ( α ̆ 2 ) = MSFE ( α ̂ 2 ) + τ 2 ϕ 2 α ′ P ̄ ′ W ̄ f P ̄ α − 2 τ ϕ tr ( W ̄ f P ̄ Q Σ ) + α ′ P ̄ ′ W ̄ f Θ + tr ( W ̄ f Ψ ) + α ′ P ̄ ′ ϒ 1 + α ′ P ̄ ′ ϒ 3 + tr ( P ̄ ϒ 2 ) + tr ( P ̄ − I 2 ( k + 1 ) ) Q ϒ 4 + 2 τ ϕ 2 2 α ′ P ̄ ′ Q − 1 P ̄ Q Σ W ̄ f P ̄ α + 2 α ′ P ̄ ′ Q − 1 Ψ W ̄ f P ̄ α + α ′ P ̄ ′ Φ W ̄ f P ̄ α + α ′ P ̄ ′ ϒ 4 P ̄ α + 2 α ′ P ̄ ′ Q − 1 P ̄ ϒ 2 P ̄ α + 2 α ′ P ̄ ′ Q − 1 ( P ̄ − I 2 ( k + 1 ) ) Q ϒ 4 P ̄ α ,

where W ̄ f = T E S ′ z 2 , T + 1 ′ z 2 , T + 1 S , and the expressions for ϒ₁ –ϒ₄ are given in Equations (A.19)–(A.22).

Proof. See Appendix A, page 34.

The following Corollary shows the conditions under which the MSFE of the Stein-like shrinkage estimator is less than the MSFE of the post-break unrestricted estimator ( α ̂ 2 = S α ̂ ) . Before stating the Corollary, we define some notations. Let ϕ f = α ′ P ̄ ′ W ̄ f P ̄ α , μ f = tr ( W ̄ f P ̄ Q Σ ) + α ′ P ̄ ′ W ̄ f Θ + tr ( W ̄ f Ψ ) + α ′ P ̄ ′ ϒ 1 + α ′ P ̄ ′ ϒ 3 + tr ( P ̄ ϒ 2 ) + tr ( ( P ̄ − I 2 ( k + 1 ) ) Q ϒ 4 ) , and η f = 2 α ′ P ̄ ′ Q − 1 P ̄ Q Σ W ̄ f P ̄ α + 2 α ′ P ̄ ′ Q − 1 Ψ W ̄ f P ̄ α + α ′ P ̄ ′ Φ W ̄ f P ̄ α + α ′ P ̄ ′ ϒ 4 P ̄ α + 2 α ′ P ̄ ′ Q − 1 P ̄ ϒ 2 P ̄ α + 2 α ′ P ̄ ′ Q − 1 ( P ̄ − I 2 ( k + 1 ) ) Q ϒ 4 P ̄ α .

Corollary 2.1

Under Assumption 1, if μ f > 1 ϕ η f , and 0 < τ ≤ 2 ϕ ϕ f μ f − 1 ϕ η f , then the MSFE of the Stein-like shrinkage estimator up to order T ⁻¹ is

(4.11) MSFE ( α ̆ 2 ) < MSFE ( α ̂ 2 ) .

In addition, the optimal shrinkage parameter, denoted by τ opt f , that minimizes the MSFE of the Stein-like shrinkage estimator up to order T ⁻¹, is

(4.12) τ opt f = ϕ ϕ f μ f − 1 ϕ η f .

Proof. See Appendix A, page 37.

We note that the optimal shrinkage parameter is data dependent. Therefore, when it is used for forecast evaluations, it needs to be calculated for each estimation window of the sample (rolling or recursive window).

4.2 Non-Stationary Regressors

Our analysis can also allow for the unit root and non-stationary regressors. It is possible for the y _t process to contain a unit root (be integrated of order 1, I(1)) or be integrated of higher order in both of the regimes. In this section, we demonstrate the results of Theorem 1 while we relax Assumption 1(ii) on the stationarity of the exogenous regressors. We extend the results of Theorem 1 by relaxing Assumption 1(i) and (ii) on the stability, and the stationarity of the exogenous regressors in Section 5.

Following Kiviet and Phillips (2005), we rescale the regressors and the coefficients so that all elements of the estimation error vector are of the same stochastic magnitude. We assume that for j = 1, …, k + 1, the series of real positive constants g _j is given, such that Z ̇ ′ Z ̇ = O p ( T ) , where Z ̇ = Z N , and N = I 2 ⊗ diag T − g 1 , … , T − g k + 1 . In practice, one needs to determine the orders of integration. The Augmented Dickey-Fuller test statistic is a commonly used method to determine the order of integration of a time series. The basic idea is to test for the level of difference at which the series is stationary. For more discussion and alternative approaches, we refer the readers to chapter 14 of Gourieroux and Monfort (1997) and Smeekes and Wijler (2020).

Remark 5.

Non-Stationary Regressors: In the model of Equation (2.1), if Assumption 1(i) holds but the lth column of exogenous regressors is a linear trend or an I(1) process while others are stationary, then we have g ₁ = 1, and g _l+1 = 1, while g _j = 0 for j = 2, …, l, l + 2, …, k + 1.

Remark 6.

Unit Root, and Non-Stationary Regressors: In the unit root model of Equation (2.1) (i.e. λ ₁ = λ ₂ = 1), if the exogenous regressors are stationary, then we have g ₁ = 1, and g _j = 0, for j = 2, …, k + 1. When the lth column of X is a linear trend or an I(1) process, then g ₁ = 2 because of the unit root, g _l+1 = 1, and g _j = 0 for j = 2, …, l, l + 2, …, k + 1.

Using the rescaled regressors, we write the model in (2.2) as

(4.13) y = Z N ( N − 1 α ) + u ,

where N ⁻¹ α, are the rescaled coefficients. Moreover, the rescaled unrestricted estimator can be formulated as

(4.14) N − 1 ( α ̂ − α ) = ( Z ̇ ′ Z ̇ ) − 1 Z ̇ ′ u = N − 1 ( Z ′ Z ) − 1 Z ′ u ,

the rescaled restricted estimator can be formulated as

(4.15) N − 1 ( α ̃ − α ) = N − 1 ( α ̂ − α ) − ( Z ̇ ′ Z ̇ ) − 1 R ′ R ( Z ̇ ′ Z ̇ ) − 1 R ′ − 1 R N − 1 α ̂ = N − 1 ( α ̂ − α ) − N − 1 ( Z ′ Z ) − 1 R ′ R ( Z ′ Z ) − 1 R ′ − 1 R α ̂ ,

and the rescaled Stein-like shrinkage estimator is

(4.16) N − 1 ( α ̆ − α ) = ω N − 1 ( α ̂ − α ) + ( 1 − ω ) N − 1 ( α ̃ − α ) .

Therefore, in order to find the bias and MSE of the Stein-like shrinkage estimator, we need to pre-multiply the bias of the estimator in Theorem 1 by N, and pre- and post-multiply the MSE of the estimator by N.

Theorem 3.

The bias and MSE formulas of the Stein-like shrinkage estimator in Theorem 1 also apply when the exogenous regressors contain non-stationary components. However, the jth element of bias is approximated up to order T − 1 − g j , and the (j, l)th element of MSE is approximated up to order T − 2 − g j − g l .

Proof. The proof follows from proof of Theorem 1, and we omit it for brevity.

5 Unit Root

In this section, we extend the results of Theorem 1 by relaxing Assumption 1(i) and (ii) on the stability, and the stationarity of the exogenous regressors. We assume that for j = 1, …, k + 1, the series of real positive constants g _j is given, such that Z ̇ ′ Z ̇ = O p ( T ) , where Z ̇ = Z N , and N = I 2 ⊗ diag T − g 1 , … , T − g k + 1 .

Theorem 4.

Under Assumption 1(iii)–(vii), when the coefficients of the lagged dependent variable is equal to unity, the bias of the rescaled Stein-like shrinkage estimator up to order T ⁻¹ is

(5.1) Bias ( α ̇ ̆ ) = E ( α ̇ ̆ − α ̇ ) = Θ ̇ − τ ϕ ̇ P ̇ ̄ α ̇ ,

where ϕ ̇ = α ̇ ′ P ̇ ̄ ′ Q ̇ − 1 P ̇ ̄ α ̇ = O ( T ) , P ̇ ̄ = Q ̇ R ′ ( R Q ̇ R ′ ) − 1 R = O ( 1 ) , Q ̇ = [ E ( Z ̇ ′ ) E ( Z ̇ ) ] − 1 = O ( T − 1 ) , and the first term above is the bias of the unrestricted estimator up to order T ⁻¹ which is Θ ̇ = − Q ̇ ∑ i = 1 2 Z ̇ ̄ ′ L i C ̇ Ω u Z ̇ ̄ Q ̇ N e 1 , i + N e 1 , i tr Q Z ̇ ̄ ′ L i C ̇ Ω u Z ̇ ̄ , and C ̇ has zeroes on and above its main diagonal and components unity below. Also, the MSE of the rescaled Stein-like shrinkage estimator up to order T ⁻² is

(5.2) MSE ( α ̇ ̆ ) = E ( α ̇ ̆ − α ̇ ) ( α ̇ ̆ − α ̇ ) ′ = MSE ( α ̇ ̂ ) + τ 2 ϕ ̇ 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ − τ ϕ ̇ P ̇ ̄ Q ̇ Σ + Σ Q ̇ P ̇ ̄ ′ + Θ ̇ α ̇ ′ P ̇ ′ + P ̇ ̄ α ̇ Θ ̇ ′ + Ψ ̇ + Ψ ̇ ′ + 2 τ ϕ ̇ 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ Q ̇ − 1 P ̇ ̄ Q ̇ Σ + Ψ ̇ + Σ Q ̇ P ̇ ̄ ′ + Ψ ̇ ′ Q ̇ − 1 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ + τ ϕ ̇ 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ Φ ̇ + Φ ̇ ′ P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ ,

where Ψ ̇ and Φ ̇ are given in Equations (A.35), (A.36), and MSE ( α ̇ ̂ ) is given in Equation (C.5) in the Supplementary Online Appendix C.

Further, for the fixed symmetric positive definite weight matrix W of order O(T), the risk of the Stein-like shrinkage estimator up to order T ⁻¹ is

(5.3) Risk ( α ̇ ̆ ) = E ( α ̇ ̆ − α ̇ ) ′ W ( α ̇ ̆ − α ̇ ) = Risk ( α ̇ ̂ ) + τ 2 ϕ ̇ 2 α ̇ ′ P ̇ ̄ ′ W P ̇ ̄ α ̇ − 2 τ ϕ ̇ tr ( W P ̇ ̄ Q ̇ Σ ) + α ̇ ′ P ̇ ̄ ′ W Θ ̇ + tr ( W Ψ ̇ ) + 2 τ ϕ ̇ 2 2 α ̇ ′ P ̇ ̄ ′ Q ̇ − 1 P ̇ ̄ Q ̇ Σ W P ̇ ̄ α ̇ + 2 α ̇ ′ P ̇ ̄ ′ Q ̇ − 1 Ψ ̇ W P ̇ ̄ α ̇ + α ̇ ′ P ̇ ̄ ′ Φ ̇ W P ̇ ̄ α ̇ ,

where Σ = diag σ 1 2 I k + 1 , σ 2 2 I k + 1 , and Risk ( α ̇ ̂ ) = tr ( MSE ( α ̇ ̂ ) W ) .

Proof. See Appendix A, page 37.

Remark 7.

We note that the bias and MSE of the Stein-like shrinkage estimator can be obtained using the bias and MSE of the rescaled Stein-like shrinkage estimator as follows:

Bias ( α ̆ ) = E ( α ̆ − α ) = N Bias ( α ̇ ̆ ) ,

MSE ( α ̆ ) = E ( α ̆ − α ) ( α ̆ − α ) ′ = N MSE ( α ̇ ̆ ) N ,

where the jth element of bias is approximated up to order T − 1 − g j , and the (j, l)th element of MSE is approximated up to order T − 2 − g j − g l .

Similar to Section 4.1, we define the one-step-ahead mean squared forecast error (MSFE) of the shrinkage estimator as

(5.4) MSFE ( α ̆ 2 ) = E ( α ̆ − α ) ′ W f ( α ̆ − α ) = E ( α ̇ ̆ − α ̇ ) ′ W ̇ f ( α ̇ ̆ − α ̇ ) ,

where W ̇ f = N W f N = O p ( T ) . In the following Theorem, we give the MSFE of the Stein-like shrinkage estimator when the model contains a unit root.

Theorem 5.

Under Assumption 1(iii)–(vii), when the coefficients of the lagged dependent variable is equal to unity, the MSFE of the Stein-like shrinkage estimator up to order T ⁻¹ is

(5.5) MSFE ( α ̆ 2 ) = MSFE ( α ̂ 2 ) + τ 2 ϕ 2 α ′ P ̄ ′ W ̄ f P ̄ α − 2 τ ϕ tr ( W ̄ f P ̄ Q Σ ) + α ′ P ̄ ′ W ̄ f Θ + tr ( W ̄ f Ψ ) + α ′ P ̄ ′ ϒ ̄ + 2 τ ϕ 2 2 α ′ P ̄ ′ Q − 1 P ̄ Q Σ W ̄ f P ̄ α + 2 α ′ P ̄ ′ Q − 1 Ψ W ̄ f P ̄ α + α ′ P ̄ ′ Φ W ̄ f P ̄ α ,

where W ̄ f = T E S ′ z 2 , T + 1 ′ E ( z 2 , T + 1 S ) , and the expression for ϒ ̄ = N − 1 ϒ ̇ with ϒ ̇ given in Equation (A.40).

Proof.

See Appendix A, page 41.

6 Monte Carlo Simulation

In this section, we report the finite sample performance of our proposed Stein-like shrinkage estimator. We compare the forecasting performance of the shrinkage estimator with those of the unrestricted estimator and the restricted estimator. We consider the following data generating process (DGP)

(6.1) y t = a 1 + λ 1 y t − 1 + x t ′ β 1 + σ 1 ϵ t , for t ≤ T 1 a 2 + λ 2 y t − 1 + x t ′ β 2 + σ 2 ϵ t , for t > T 1 ,

where a ₁ = (1 − λ ₁) is the intercept in the first regime, x _t follows a multivariate normal distribution with mean zero and unit variance, and ϵ _t ∼ i.i.d. N(0, 1).^[13] β ₁ is a k × 1 unit vector of slope parameters of the exogenous regressors. We consider T = 100, b ₁ ≡ T ₁/T ∈ {0.2, 0.4, 0.6, 0.8}, and k ∈ {3, 5, 8}, where the results for k ∈ {3, 8} are available in the Supplementary Online Appendix D. The break size in the slope coefficients of the exogenous regressors, δι _k ≡ β ₁ − β ₂, takes values of δ ∈ {0, 0.25, 0.5, 0.75, 1}, where ι _k is a k × 1 vector of ones, and the intercept in the second regime is a ₂ = (1 + δ)(1 − λ ₂).^[14] For the initial observation, y ₀, we consider a fixed start-up by setting d = 0, and y ̄ 0 = a 1 / ( 1 − λ 1 ) .

For the break in the autoregressive slope coefficients, we study different experiments summarized in Table 1. Experiment 1 considers no break in the autoregressive parameters. Small breaks, moderate breaks, and large breaks in the autoregressive slope coefficients in either direction are considered in Experiments 2–7. Higher post break volatility (σ ₂ > σ ₁) and lower post-break volatility (σ ₂ < σ ₁) are considered in Experiments 8 and 9. Experiments 10–12 consider a unit root process (λ ₁ = λ ₂ = 1) with no break in the error variance, higher post break volatility, and lower post break volatility, respectively. In all simulations, we estimate the unknown parameters, such as the break point, and the break size in the slope coefficients and in the error variance.

Table 1:

Break specifications.

Experiments	λ ₁	λ ₂	σ ₁	σ ₂
#1: No break	0.9	0.9	1	1
#2: Small break in λ	0.8	0.9	1	1
#3: Small break in λ (decline)	0.9	0.8	1	1
#4: Moderate break in λ	0.3	0.6	1	1
#5: Moderate break in λ (decline)	0.6	0.3	1	1
#6: Large break in λ	0.2	0.8	1	1
#7: Large break in λ (decline)	0.8	0.2	1	1
#8: Higher post-break volatility	0.9	0.9	0.5	1
#9: Lower post-break volatility	0.9	0.9	1	0.5
#10: Unit root	1	1	1	1
#11: Unit root with higher post-break volatility	1	1	0.5	1
#12: Unit root with lower post-break volatility	1	1	1	0.5

The results of 5,000 Monte Carlo simulations are reported in Table 2. The first column in each table represents the experiment number based on Table 1, the second column shows the break size in the slope coefficients (δ), and the rest report the relative mean squared forecast error (RMSFE) of the post-break slope coefficients of the shrinkage estimator, and the restricted estimator for different break points. Thus, the RMSFE of the shrinkage estimator, and the restricted estimator are denoted by RMSFE ( α ̆ 2 ) = MSFE ( α ̆ 2 ) / MSFE ( α ̂ 2 ) , and RMSFE ( α ̃ 2 ) = MSFE ( α ̃ 2 ) / MSFE ( α ̂ 2 ) , respectively.

Table 2:

Simulation results with k = 5.

Exp.	b ₁:	0.2		0.4		0.6		0.8
	δ	RMSFE ( α ̆ 2 )	RMSFE ( α ̃ 2 )	RMSFE ( α ̆ 2 )	RMSFE ( α ̃ 2 )	RMSFE ( α ̆ 2 )	RMSFE ( α ̃ 2 )	RMSFE ( α ̆ 2 )	RMSFE ( α ̃ 2 )
#1	0.000	0.794	0.439	0.796	0.445	0.797	0.447	0.788	0.443
	0.250	0.816	0.529	0.841	0.647	0.828	0.650	0.839	0.665
	0.500	0.887	1.045	0.902	1.776	0.876	1.818	0.855	1.222
	0.750	0.966	2.225	0.966	4.248	0.943	4.704	0.907	2.814
	1.000	0.988	3.602	0.984	7.454	0.971	8.720	0.948	5.231
#2	0.000	0.802	0.460	0.799	0.435	0.790	0.392	0.788	0.357
	0.250	0.824	0.577	0.836	0.675	0.818	0.651	0.827	0.609
	0.500	0.890	1.125	0.907	1.921	0.879	1.938	0.865	1.278
	0.750	0.964	2.310	0.967	4.401	0.945	4.803	0.916	2.900
	1.000	0.989	3.744	0.984	7.615	0.972	8.864	0.951	5.356
#3	0.000	0.819	0.505	0.814	0.499	0.786	0.438	0.765	0.396
	0.250	0.847	0.628	0.851	0.753	0.827	0.740	0.817	0.663
	0.500	0.904	1.172	0.924	2.140	0.892	2.237	0.856	1.440
	0.750	0.970	2.337	0.971	4.600	0.952	5.350	0.914	3.268
	1.000	0.988	3.707	0.985	7.906	0.975	9.589	0.952	5.932
#4	0.000	0.862	0.644	0.858	0.646	0.837	0.549	0.824	0.472
	0.250	0.879	0.802	0.878	0.974	0.852	0.893	0.837	0.684
	0.500	0.931	1.424	0.938	2.390	0.911	2.543	0.882	1.637
	0.750	0.975	2.531	0.971	4.818	0.954	5.596	0.928	3.556
	1.000	0.989	3.948	0.984	8.214	0.974	10.038	0.955	6.404
#5	0.000	0.858	0.682	0.853	0.746	0.811	0.629	0.779	0.520
	0.250	0.887	0.904	0.889	1.264	0.854	1.241	0.818	0.903
	0.500	0.943	1.633	0.949	2.966	0.923	3.344	0.880	2.164
	0.750	0.979	2.783	0.977	5.679	0.962	6.835	0.932	4.493
	1.000	0.989	4.237	0.987	9.376	0.978	11.665	0.960	7.700
#6	0.000	0.944	1.114	0.959	1.313	0.955	1.194	0.933	0.753
	0.250	0.953	1.326	0.959	1.689	0.956	1.617	0.941	1.002
	0.500	0.973	1.952	0.969	3.038	0.963	3.251	0.953	1.991
	0.750	0.986	2.944	0.979	5.365	0.972	6.172	0.964	3.776
	1.000	0.992	4.312	0.986	8.670	0.980	10.370	0.972	6.342
#7	0.000	0.964	2.279	0.964	3.096	0.928	2.529	0.849	1.267
	0.250	0.981	2.670	0.980	4.032	0.956	3.738	0.898	2.042
	0.500	0.989	3.412	0.989	5.946	0.976	6.274	0.939	3.688
	0.750	0.994	4.541	0.993	8.895	0.986	10.178	0.963	6.238
	1.000	0.996	6.074	0.996	12.909	0.992	15.417	0.977	9.744
#8	0.000	0.749	0.339	0.745	0.287	0.732	0.220	0.717	0.197
	0.250	0.783	0.404	0.802	0.436	0.809	0.460	0.820	0.534
	0.500	0.877	0.888	0.913	1.626	0.898	1.758	0.870	1.232
	0.750	0.973	2.108	0.978	4.110	0.960	4.601	0.925	2.933
	1.000	0.992	3.476	0.990	7.304	0.980	8.664	0.958	5.396
#9	0.000	0.931	0.865	0.963	0.959	0.952	0.914	0.905	0.761
	0.250	0.965	1.250	0.969	1.560	0.954	1.446	0.933	1.021
	0.500	0.995	3.354	0.997	6.920	0.988	7.162	0.954	2.892
	0.750	1.000	7.611	1.000	16.127	0.997	18.475	0.986	8.864
	1.000	1.001	12.939	1.001	28.858	0.998	34.304	0.993	17.941
#10	0.000	0.875	0.577	0.873	0.569	0.879	0.578	0.874	0.574
	0.250	0.886	0.664	0.885	0.754	0.882	0.802	0.907	0.833
	0.500	0.922	1.249	0.933	2.277	0.918	2.529	0.912	1.689
	0.750	0.974	2.660	0.982	5.566	0.970	6.706	0.960	4.385
	1.000	0.991	4.375	0.995	9.757	0.987	12.200	0.988	8.386
#11	0.000	0.857	0.495	0.853	0.404	0.830	0.324	0.820	0.304
	0.250	0.859	0.520	0.861	0.556	0.868	0.624	0.890	0.726
	0.500	0.918	1.136	0.943	2.232	0.929	2.587	0.915	1.792
	0.750	0.984	2.571	0.985	5.441	0.974	6.666	0.954	4.653
	1.000	0.995	4.262	0.993	9.659	0.987	12.150	0.976	8.595
#12	0.000	0.991	0.908	0.997	0.976	0.993	0.950	0.986	0.837
	0.250	0.995	1.287	0.999	1.702	0.998	1.669	0.992	1.168
	0.500	1.000	3.850	1.001	8.888	1.003	10.062	1.000	4.404
	0.750	1.000	9.250	1.000	21.014	1.002	25.997	1.003	14.672
	1.000	1.000	15.912	1.000	37.792	1.000	47.839	1.002	29.821

Note: This table reports the results of the RMSFE where the benchmark model is the unrestricted estimator. The first column shows the experiment numbers which represent the break specifications based on Table 1, and the second column is the break size in the slope coefficients. In the heading of the table, RMSFE ( α ̆ 2 ) = MSFE ( α ̆ 2 ) / MSFE ( α ̂ 2 ) shows the RMSFE of the shrinkage estimator, and RMSFE ( α ̃ 2 ) = MSFE ( α ̃ 2 ) / MSFE ( α ̂ 2 ) is the RMSFE of the restricted estimator.

The Monte Carlo results support our theoretical findings presented in Section 4. The results show that the RMSFE of our proposed shrinkage estimator is uniformly less than or equal to that of the unrestricted estimator over different break sizes and break points. This shows the superiority of the shrinkage estimator relative to the unrestricted estimator. For small break sizes in the slope coefficients (small δ), the restricted estimator performs better than the unrestricted estimator. This is expected, because under small breaks, the bias of the restricted estimator will be dominated by its variance efficiency. In this case, the shrinkage estimator tends to gain more from the efficiency of the restricted estimator by assigning a larger weight to this estimator, and therefore remains one of the best choices. When there is a large break in the slope coefficients, the shrinkage estimator performs much better than the restricted estimator, and remains close to the unrestricted estimator. This happens because under a large break, the restricted estimator has a large bias, so the shrinkage estimator assigns more weight to the unrestricted estimator.

When the break occurs towards the end of the sample (e.g. b ₁ = 0.8), there are a few observations in the post-break sample, which results in a poor performance of the unrestricted estimator as it uses fewer observations. Consequently, in this case, the RMSFE ( α ̃ 2 ) tends to be smaller relative to the case where the break occurs towards the beginning of the sample (e.g. b ₁ = 0.2). However, since the shrinkage estimator is a weighted average of the restricted and unrestricted estimators, it prevails.

When the pre-break sample is less volatile compared to the post-break sample (σ ₁ < σ ₂), the shrinkage estimator gains from exploiting the less volatile pre-break sample observations. Therefore, the shrinkage estimator provides a better estimation of the post-break slope coefficients and outperforms the unrestricted estimator. On the other hand, when σ ₁ > σ ₂, the gain obtained from exploiting the pre-break sample observations decrease as the pre-break sample is more volatile. In this case, the shrinkage estimator gives a larger weight to the unrestricted estimator. We also find that as the number of regressors increases, the shrinkage estimator performs better. A similar results hold when the dependent variable follows a unit root process in Experiments 10–12.

Furthermore, the results show that for a high degree of dependence or persistency (e.g. experiments #2 and #3 in which we also have a small break in the autoregressive slope coefficients λ), the gain obtained by using the Stein-like shrinkage estimator over the unrestricted estimator is around 10 %–23.5 % for a small to moderate break in the slope coefficients of the exogenous regressors (δ ≤ 0.5), i.e. the MSFE decreases around 10 %–23.5 % by using the Stein-like shrinkage estimator relative to the unrestricted estimator. On the other hand, when there is a large break in the slope coefficients of the exogenous regressors (δ > 0.5), this gain is almost less than 5 %. However, it is worth mentioning that the Stein-like shrinkage estimator never under-performs the unrestricted estimator. We see a similar pattern when the coefficient in one of the regime is persistent (e.g. experiments #6 and #7 in which we also have a large break in the autoregressive slope coefficients λ). When we have a unit root (e.g. experiments #10, #11, and #12), the gain obtained by using the Stein-like shrinkage estimator over the unrestricted estimator is around 7 %–15 % for a small to moderate break in the slope coefficients of the exogenous regressors. Under these experiments, when there is a large break in the slope coefficients of the exogenous regressors, the Stein-like shrinkage estimator performs equivalent to the unrestricted estimator. This is expected since for a large break size, the bias created by the restricted estimator is greater than its variance efficiency. Thus, the Stein-like shrinkage estimator assigns a weight one to the unrestricted estimator and a zero weight to the restricted estimator. In general, we find that the shrinkage estimator performs well under any break sizes and break points.

7 Empirical Analysis

In this section, we present an empirical application that highlights the utility of the proposed shrinkage estimator in forecasting. In particular, we provide empirical analysis of forecasting the output growth using 131 macroeconomic and financial time series from the St. Louis Federal Reserve (FRED-MD) database. The 131 series are split into 8 groups: output and income, labor market, consumption and orders, orders and inventories, money and credit, interest rates and exchange rates, prices and stock market.

We use the monthly data from January 1959 up to March 2020. The data are described by McCracken and Ng (2016), who suggest various transformations to render the series stationary and to deal with missing values. Since the number of macroeconomic and financial variables is large, to reduce dimension, we estimate static factors by principal component analysis adapted to allow for missing values, see McCracken and Ng (2016). We select the number of significant factors using a generalization of the Mallow’s C _p criteria for large dimensional panels developed in Bai and Ng (2002). The criterion finds eight factors which can be interpreted as real activity/employment, term spreads, inflation, housing, interest rate variables, stock market variables, and output and inventories factors.

We adopt the following h-step ahead forecasting equation

(7.1) y t + h = λ h y t + f ̂ t ′ β h + u t + h , t = 1 , … , T ,

where the dependent variable is the output growth over the next h months, that is, y _t+h = (1,200/h) ln(IP_t+h/IP_t), IP _t denotes index of industrial production in levels, and f ̂ t is the estimated eight factors at time t.

In order to evaluate the performance of our proposed estimator, we compute h-step-ahead forecasts using the shrinkage estimator and compare it with those from the existing methods: the unrestricted estimator (labeled “Unrest.” in tables), the restricted estimator (“Rest.”), the method proposed by Pesaran, Pick, and Pranovich (2013) (“PPP”), the methods used in Pesaran and Timmermann (2007), namely, “Troff”, “Pooled”, “WA”, “CV”, and the average window forecast proposed by Pesaran and Pick (2011) (“AveW”). The forecasts employ a recursive (or expanding) estimation window. Each time that we expand the estimation window, we apply the Schwarz’s Bayesian Information Criteria (BIC) to select among the nine predictors except the lag dependent variable (so the model is dynamic). We estimate the break point using Bai and Perron (1998, 2003 method, by setting the significance level at 5 % and trimming rate at 0.2.

Table 3 reports the MSFEs for different methods. The first column in this table shows the forecast horizon, h = 1, 6, and 12. For comparison, we report the MSFE for different methods with estimated break date under the label estimated break dates in columns 2–9 of Table 3. Besides, as the “CV”, “WA” and “Pooled” methods can also be implemented without an estimation of the break date, we report the MSFE of these methods without estimating the break date. The results without estimating break date, treating the break date as unknown, are reported under the label unknown break dates in columns 10–13 of Table 3. We consider different out-of-sample forecast periods (Panels A–D in Table 3) running from 1980:01-2020:03 to 2005:01-2020:03 to see the performmance of different methods with this choice.

Table 3:

Empirical results for forecasting output growth.

h	Estimated break dates								Unknown break dates
	Shrinkage	Unrest.	Rest.	PPP	Troff	CV	WA	Pooled	CV	WA	Pooled	AveW
Panel A: 1980:01-2020:03
1	7.248^*	7.454	7.384	7.257	7.254	7.308	7.281	7.281	7.352	7.246	7.318	7.318
6	8.123^**	8.321	9.286	8.126	8.131	8.412	8.993	8.980	8.768	8.544	8.468	8.467
12	9.126^**	9.398	9.734	9.199	9.481	10.061	9.763	9.763	9.833	9.481	9.621	9.622
Panel B: 1990:01-2020:03
1	7.134^**	7.539	7.410	7.340	7.339	7.408	7.391	7.392	7.415	7.342	7.365	7.365
6	7.002^**	7.300	8.664	7.106	7.113	7.491	8.27	8.253	7.787	7.663	7.539	7.539
12	8.190^**	8.588	8.913	8.389	8.723	9.405	8.979	8.989	8.911	8.549	8.551	8.552
Panel C: 2000:01-2020:03
1	8.953^**	9.162	8.960	8.968	8.962	9.060	9.036	9.036	9.068	8.956	9.048	9.048
6	8.287^*	8.484	10.530	8.293	8.304	8.867	9.985	9.960	9.924	9.462	9.400	9.400
12	9.380^**	9.735	9.576	9.536	10.035	10.771	9.807	9.861	10.773	9.393	10.135	10.136
Panel D: 2005:01-2020:03
1	10.090^**	10.302	10.098	10.112	10.102	10.231	10.137	10.138	10.302	10.091	10.232	10.232
6	8.627^*	8.830	9.066	8.627^*	8.657	8.789	8.973	8.977	10.454	8.792	9.302	9.302
12	10.726^**	11.138	11.027	10.939	11.651	12.705	11.407	11.489	12.760	10.903	12.030	12.032

Note: This table reports the MSFE for different forecasting methods. h in the first column shows the forecast horizon. In the heading of the table, Shrinkage shows the results for our proposed Stein-like shrinkage estimator, Unrest, shows the results for the unrestricted estimator; Rest, shows the results for the restricted estimator; PPP is the one proposed by Pesaran et al. (2013), Troff, CV, WA, and Pooled are the methods used in Pesaran and Timmermann (2007), and AveW is the method proposed by Pesaran and Pick (2011) with T(1 − w _min) + 1 windows and w _min = 0.1. Panels A–D report the MSFE results with different out-of-sample forecast periods. An asterisk denote forecast that is significantly better than that obtained from the post-break forecasts according to the Diebold–Mariano test statistic. The 5 % and 10 % significance levels are denoted by ^** and ^*, respectively.

We test for equal forecast performance of different methods compared to the unrestricted forecast using the Diebold and Mariano (1995) test. Table 3 also shows the results for the Diebold and Mariano test statistic, indicated by asterisks. The 5 % and 10 % significance levels are denoted by ^** and ^*, respectively. Based on the results, the shrinkage forecasts perform better than the unrestricted forecasts in the sense of MSFE, over different forecast horizons, and over different out-of-sample forecast periods. The improvement is statistically significant at 5 % or 10 % levels. The forecast improvement ranges from 2 to 5.3 percent. The other alternative methods often outperform the unrestricted forecast, however, their out-performance is not statistically significant which might be a sign of having high variations. Besides, there are some cases that the alternative methods under-perform the unrestricted forecast.^[15]

8 Conclusions

We introduce a method of estimation and forecasting in ARX(1) models under structural breaks. The theoretical results can be generalized to models with higher order lags, however, it will not yield qualitatively different results. The proposed method has four main advantages relative to the other model averaging and shrinkage estimation methods existing in the literature. First, our method allows for a unit root and non-stationary exogenous regressors. This is important because it can be used in a large class of economic empirical applications. Second, the dominance and optimality of the shrinkage estimator is not limited to MSE and holds for any weighted quadratic loss function where the weight is positive definite and symmetric. This allows to use the proposed Stein-like shrinkage method for out-of-sample forecasting by choosing an appropriate weight. Third, the averaging weight is proportional to the reciprocal of the weighted distance of the restricted estimator and the unrestricted estimator, which provides a shrinkage estimator with a uniformly lower MSE than the unrestricted estimator. Lastly, we provide the large-sample analytical approximations of the bias, MSE and risk of the Stein-like shrinkage estimator, the unrestricted estimator, and the restricted estimator. We also evaluate the performance of our estimator in Monte Carlo simulations, and in an empirical application of forecasting output growth using macroeconomic and financial variables, and show that the Stein-like shrinkage estimator performs well relative to alternative methods.

Corresponding author: Shahnaz Parsaeian, Department of Economics, University of Kansas, Lawrence, USA, E-mail: sh.parsaeian@ku.edu

Appendix A

Proof of Theorem 1

From the proof of Lemma B.3 in the Supplementary Online Appendix, we have

(A.1) α ̂ − α = ( Z ′ Z ) − 1 Z ′ u = ξ − 1 2 + ξ − 1 + ξ − 3 2 + O p ( T − 2 ) ,

where ξ − 1 2 , ξ ₋₁, and ξ − 3 2 are defined below, and the suffixes show the order of magnitude in probability,

ξ − 1 2 = Q Z ̄ ′ ( 0 , I T ) ν + ∑ i = 1 2 ν ′ H i ν e 1 , i = O p T − 1 2 , ξ − 1 = − Q ( A + B ) Z ̄ ′ ( 0 , I T ) ν + ∑ i = 1 2 ν ′ H i ν e 1 , i = O p ( T − 1 ) , ξ − 3 2 = Q ( A + B ) 2 Z ̄ ′ ( 0 , I T ) ν + ∑ i = 1 2 ν ′ H i ν e 1 , i = O p T − 3 2 .

Also, let W D ≡ Z ′ Z = W ̄ D + W ̃ D , where W ̄ D = E ( Z ′ Z ) = Z ̄ ′ Z ̄ + E ( Z ̃ ′ Z ̃ ) = Z ̄ ′ Z ̄ + ∑ i = 1 2 tr G ′ L i G Ω ν e 1 , i e 1 , i ′ = O p ( T ) , and W ̃ D = Z ̄ ′ Z ̃ + Z ̃ ′ Z ̄ + Z ̃ ′ Z ̃ − E ( Z ̃ ′ Z ̃ ) = ( A + B ) Q − 1 = O p ( T 1 2 ) , where A and B are given in Lemma B.2 in the Supplementary Online Appendix.

Using Equation (A.1) in the inverse of Equation (3.6), we have

(A.2) 1 D ( α ̂ , α ̃ ) = ( α ̂ − α ̃ ) ′ W D ( α ̂ − α ̃ ) − 1 = α ̂ ′ P ′ W D P α ̂ − 1 = α + ξ − 1 2 + O p ( T − 1 ) ′ P ̄ + P ̃ − 1 2 + O p ( T − 1 ) ′ ( W ̄ D + W ̃ D ) × P ̄ + P ̃ − 1 2 + O p ( T − 1 ) α + ξ − 1 2 + O p ( T − 1 ) − 1 = ϕ + ϕ ̃ 1 2 + 2 α ′ P ̄ ′ W ̄ D P ̄ ξ − 1 2 + 2 α ′ P ̄ ′ W ̄ D P ̃ − 1 2 α + O p ( 1 ) − 1 = 1 ϕ 1 + 1 ϕ ϕ ̃ 1 2 + 2 ϕ α ′ P ̄ ′ W ̄ D P ̄ ξ − 1 2 + 2 ϕ α ′ P ̄ ′ W ̄ D P ̃ − 1 2 α + O p ( T − 1 ) − 1 = 1 ϕ 1 − 1 ϕ ϕ ̃ 1 2 − 2 ϕ α ′ P ̄ ′ W ̄ D P ̄ ξ − 1 2 − 2 ϕ α ′ P ̄ ′ W ̄ D P ̃ − 1 2 α + O p ( T − 2 ) ≡ 1 ϕ ︸ O p ( T − 1 ) − 1 ϕ 2 D 1 2 ︸ O p T − 3 2 + O p ( T − 2 ) ,

where D 1 2 = ϕ ̃ 1 2 + 2 α ′ P ̄ ′ W ̄ D P ̄ ξ − 1 2 + α ′ P ̄ ′ W ̄ D P ̃ − 1 2 α = O p ( T 1 2 ) , ϕ = α ′ P ̄ ′ W ̄ D P ̄ α = O ( T ) , ϕ ̃ 1 2 = α ′ P ̄ ′ W ̃ D P ̄ α = O ( T 1 / 2 ) , and the last equality above holds by using the standard geometric expansion. The terms with order O _p(T ⁻²) and smaller are dropped, because they will not enter in the calculation of the bias and MSE of the Stein-like shrinkage estimator up to the orders of interest.

Employing Equations (A.1) and (A.2) in (3.5), we obtain

(A.3) α ̆ − α = ( α ̂ − α ) − τ 1 ϕ − 1 ϕ 2 D 1 2 + O p ( T − 2 ) P ̄ + P ̃ − 1 2 + O p ( T − 1 ) α ̂ = ζ − 1 2 + ζ − 1 + ζ − 3 2 + O p ( T − 2 ) ,

where ζ − 1 2 , ζ − 1 and ζ − 3 2 are defined below

ζ − 1 2 = ξ − 1 2 = O p T − 1 2 , ζ − 1 = ξ − 1 − τ ϕ P ̄ α = O p ( T − 1 ) , ζ − 3 2 = ξ − 3 2 − τ ϕ P ̄ ξ − 1 2 − τ ϕ P ̃ − 1 2 α + τ ϕ 2 D 1 2 P ̄ α = O p T − 3 2 .

The bias of the Stein-like shrinkage estimator using the approximations in Equation (A.3) up to order O(T ⁻¹) is

(A.4) E ( α ̆ − α ) = E ζ − 1 2 + ζ − 1 = E ξ − 1 2 + E ( ξ − 1 ) − τ ϕ P ̄ α = Θ − τ ϕ P ̄ α ,

where the last equality holds because of (B.24) and (B.25) in Lemma B.3.

The MSE of the Stein-like shrinkage estimator up to order O(T ⁻²) is

(A.5) E ( α ̆ − α ) ( α ̆ − α ) ′ = E Γ − 1 + Γ − 3 2 + Γ − 2 ,

where Γ − 1 , Γ − 3 2 and Γ₋₂ are

Γ − 1 = ζ − 1 2 ζ − 1 2 ′ , Γ − 3 2 = ζ − 1 2 ζ − 1 ′ + ζ − 1 ζ − 1 2 ′ , Γ − 2 = ζ − 1 2 ζ − 3 2 ′ + ζ − 3 2 ζ − 1 2 ′ + ζ − 1 ζ − 1 ′ ,

and we derive their expectations in the rest of the proof using Lemmas B.1 and B.3.

(A.6) E ( Γ − 1 ) = E ζ − 1 2 ζ − 1 2 ′ = E ξ − 1 2 ξ − 1 2 ′ = Q Σ ,

where the last equality holds because of (B.27) in Lemma B.3.

(A.7) E Γ − 3 2 = E ζ − 1 2 ζ − 1 ′ + E ζ − 1 ζ − 1 2 ′ = E ξ − 1 ξ − 1 2 ′ + E ξ − 1 2 ξ − 1 ′ ,

because

(A.8) E ζ − 1 ζ − 1 2 ′ = E ξ − 1 − τ ϕ P ̄ α ξ − 1 2 ′ = E ξ − 1 ξ − 1 2 ′ − τ ϕ P ̄ α E ξ − 1 2 ′ = E ξ − 1 ξ − 1 2 ′ ,

where the last equality holds by (B.24), and E ξ − 1 ξ − 1 2 ′ is given in (B.28). Also, we have

(A.9) E ( Γ − 2 ) = E ζ − 1 2 ζ − 3 2 ′ + E ζ − 3 2 ζ − 1 2 ′ + E ζ − 1 ζ − 1 ′ = E ξ − 3 2 ξ − 1 2 ′ + E ξ − 1 2 ξ − 3 2 ′ + E ξ − 1 ξ − 1 ′ + τ 2 ϕ 2 P ̄ α α ′ P ̄ ′ − τ ϕ Θ α ′ P ̄ ′ + P ̄ α Θ ′ − τ ϕ P ̄ Q Σ + Σ Q P ̄ ′ + E P ̃ − 1 2 α ξ − 1 2 ′ + E P ̃ − 1 2 α ξ − 1 2 ′ ′ + 2 τ ϕ 2 P ̄ α α ′ P ̄ ′ W ̄ D P ̄ Q Σ + Σ Q P ̄ ′ W ̄ D P ̄ α α ′ P ̄ ′ + P ̄ α α ′ P ̄ ′ W ̄ D E P ̃ − 1 2 α ξ − 1 2 ′ + E P ̃ − 1 2 α ξ − 1 2 ′ ′ W ̄ D P ̄ α α ′ P ̄ ′ + τ ϕ 2 P ̄ α α ′ P ̄ ′ E W ̃ D P ̄ α ξ − 1 2 ′ + E ξ − 1 2 α ′ P ̄ ′ W ̃ D P ̄ α α ′ P ̄ ′ ,

where

(A.10) E ζ − 1 ζ − 1 ′ = E ξ − 1 ξ − 1 ′ − τ ϕ E ξ − 1 α ′ P ̄ ′ − τ ϕ E P ̄ α ξ − 1 ′ + τ 2 ϕ 2 P ̄ α α ′ P ̄ ′ = E ξ − 1 ξ − 1 ′ − τ ϕ E ( ξ − 1 ) α ′ P ̄ ′ + P ̄ α E ξ − 1 ′ + τ 2 ϕ 2 P ̄ α α ′ P ̄ ′ ,

E ( ξ − 1 ) = Θ , by (B.25), and E ξ − 1 ξ − 1 ′ is given in (B.29). Also,

(A.11) E ζ − 3 2 ζ − 1 2 ′ = E ξ − 3 2 ξ − 1 2 ′ − τ ϕ P ̄ E ξ − 1 2 ξ − 1 2 ′ − τ ϕ E P ̃ − 1 2 α ξ − 1 2 ′ + τ ϕ 2 E D 1 2 P ̄ α ξ − 1 2 ′ = E ξ − 3 2 ξ − 1 2 ′ − τ ϕ P ̄ Q Σ + E P ̃ − 1 2 α ξ − 1 2 ′ + 2 τ ϕ 2 P ̄ α α ′ P ̄ ′ W ̄ D P ̄ Q Σ + E P ̃ − 1 2 α ξ − 1 2 ′ + τ ϕ 2 P ̄ α α ′ P ̄ ′ E W ̃ D P ̄ α ξ − 1 2 ′ ,

where E ξ − 3 2 ξ − 1 2 ′ is given in (B.30), and the last equality above holds by using

(A.12) E D 1 2 P ̄ α ξ − 1 2 ′ = 2 P ̄ α α ′ P ̄ ′ W ̄ D E P ̄ ξ − 1 2 + P ̃ − 1 2 α ξ − 1 2 ′ + P ̄ α α ′ P ̄ ′ E W ̃ D P ̄ α ξ − 1 2 ′ = 2 P ̄ α α ′ P ̄ ′ W ̄ D P ̄ Q Σ + 2 P ̄ α α ′ P ̄ ′ W ̄ D E P ̃ − 1 2 α ξ − 1 2 ′ + P ̄ α α ′ P ̄ ′ E W ̃ D P ̄ α ξ − 1 2 ′ ,

and using Equation (B.27). Moreover, we find that

(A.13) E P ̃ − 1 2 α ξ − 1 2 ′ = E P ̄ − I Q ( A + B ) Q − 1 P ̄ α ν ′ ( 0 , I T ) ′ Z ̄ + ∑ i = 1 2 ν ′ H i ν e 1 , i ′ Q = P ̄ − I Q ∑ i = 1 2 e 1 , i ′ P ̄ α Z ̄ ′ L i C Ω u Z ̄ Q + ∑ i = 1 2 e 1 , i α ′ P ̄ ′ Z ̄ ′ L i C Ω u Z ̄ Q + 2 ∑ i = 1 2 ∑ j = 1 2 σ j 2 e 1 , i e 1 , j ′ Q tr G Ω ν G ′ L i C L j e 1 , i ′ P ̄ α ≡ Ψ ,

and

(A.14) E W ̃ D P ̄ α ξ − 1 2 ′ = E ∑ l = 1 2 Z ̄ l ′ L l G ν e 1 , l ′ + e 1 , l ν ′ G ′ L l Z ̄ l + ∑ l = 1 2 ν ′ G ′ L l G ν − tr G ′ L l G Ω ν e 1 , l e 1 , l ′ P ̄ α ν ′ ( 0 , I T ) ′ Z ̄ + ∑ i = 1 2 ν ′ H i ν e 1 , i ′ Q = ∑ l = 1 2 e 1 , l ′ P ̄ α Z ̄ ′ L l C Ω u Z ̄ Q + e 1 , l α ′ P ̄ ′ Z ̄ ′ L l C Ω u Z ̄ Q + 2 ∑ i = 1 2 ∑ l = 1 2 e 1 , l ′ P ̄ α e 1 , l e 1 , i ′ Q tr C ′ L l G Ω ν G ′ L i Ω u ≡ Φ .

By employing the results of Equations (A.6), (A.7) and (A.9), in (A.5), we obtain the MSE of the Stein-like shrinkage estimator up to order O(T ⁻²), as

(A.15) MSE ( α ̆ ) = MSE ( α ̂ ) + τ 2 ϕ 2 P ̄ α α ′ P ̄ ′ − τ ϕ P ̄ Q Σ + Σ Q P ̄ ′ + Θ α ′ P ′ + P ̄ α Θ ′ + Ψ + Ψ ′ + 2 τ ϕ 2 P ̄ α α ′ P ̄ ′ W ̄ D P ̄ Q Σ + Ψ + Σ Q P ̄ ′ + Ψ ′ W ̄ D P ̄ α α ′ P ̄ ′ + τ ϕ 2 P ̄ α α ′ P ̄ ′ Φ + Φ ′ P ̄ α α ′ P ̄ ′ .

Further, the risk of the Stein-like shrinkage estimator up to order O(T ⁻¹), can be written as

(A.16) Risk ( α ̆ ) = E ( α ̆ − α ) ′ W ( α ̆ − α ) = tr W E ( α ̆ − α ) ( α ̆ − α ) ′ = tr W MSE ( α ̆ ) = Risk ( α ̂ ) + τ 2 ϕ 2 α ′ P ̄ ′ W P ̄ α − 2 τ ϕ tr ( W P ̄ Q Σ ) + α ′ P ̄ ′ W Θ + tr ( W Ψ ) + 4 τ ϕ 2 α ′ P ̄ ′ W ̄ D P ̄ Q Σ W P ̄ α + α ′ P ̄ ′ W ̄ D Ψ W P ̄ α + 2 τ ϕ 2 α ′ P ̄ ′ Φ W P ̄ α .

This completes the proof of Theorem 1. ■

Proof of Corollary 1.1

We note that from Theorem 1, we have

Risk ( α ̆ ) = Risk ( α ̂ ) + ϕ W ϕ 2 τ 2 − 2 τ ϕ ϕ W μ − 1 ϕ η < Risk ( α ̂ ) ,

where the last inequality holds, since μ > 1 ϕ η , and 0 < τ ≤ 2 ϕ ϕ W μ − 1 ϕ η .

In addition, since the shrinkage parameter τ in the risk of the Stein-like shrinkage estimator appears as a quadratic expression, there is a unique choice that minimizes the risk which is equal to τ opt = ϕ ϕ W μ − 1 ϕ η . ■

Proof of Corollary 1.2

From Lemma B.2 in the Supplementary Online Appendix and the proof of Theorem 1, we have the followings

ϕ ̂ = ϕ + ϕ ̂ 1 2 + O p ( 1 ) , where ϕ ̂ 1 2 = 2 α ′ P ̃ − 1 2 ′ W ̄ D P ̄ α + α ′ P ̄ ′ W ̃ D P ̄ α + 2 ξ − 1 2 ′ P ̄ ′ W ̄ D P ̄ α ,

ϕ ̂ W = ϕ W + ϕ ̂ W 1 2 + O p ( 1 ) , where ϕ ̂ W 1 2 = 2 α ′ P ̃ − 1 2 ′ W P ̄ α + 2 ξ − 1 2 ′ P ̄ ′ W P ̄ α .

Therefore, we have

(A.17) τ ̂ opt = 1 ϕ W ϕ + ϕ ̂ 1 2 − ϕ ϕ ̂ W 1 2 ϕ W + O p ( 1 ) μ ̂ − 1 ϕ 1 − ϕ ̂ 1 2 ϕ + O p ( T − 1 ) η ̂ .

Using the proof of Lemmas B.2 and B.3 in the Supplementary Online Appendix, it can be easily verified that E ( ϕ ̂ 1 2 ) = E ( ϕ ̂ W 1 2 ) = 0 . To complete the proof, we need to show that ( i ) μ ̂ = μ + μ ̂ − 1 2 + O p ( T − 1 ) where E ( μ ̂ − 1 2 ) = 0 and ( i i ) η ̂ = η + η ̂ − 1 2 + O p ( T − 1 ) where E ( η ̂ − 1 2 ) = 0 .

Let μ ̂ 1 = tr ( W P ( Z ′ Z ) − 1 Σ ̂ ) be the first term of μ ̂ . Then, we have

μ ̂ 1 = tr ( W P ̄ Q Σ ) + tr W P ̃ − 1 2 Q Σ + W P ̄ Q Σ − 1 2 + W P ̄ Q − 3 2 Σ + O p ( T − 1 ) ,

where Q − 3 2 = − Q ( A + B ) , and Σ − 1 2 = Σ ̂ − Σ . It can be easily verified that the expected value of μ ̂ 1 up to order T ⁻¹ is E ( μ ̂ 1 ) = tr ( W P ̄ Q Σ ) + O ( T − 1 ) .

Let μ ̂ 2 = α ̂ ′ P ′ W Θ ̂ be the second term of μ ̂ . Using Theorem 3.1 in Kiviet and Phillips (2003), we have

μ ̂ 2 = α ′ P ̄ ′ W Θ + α ′ P ̃ 1 2 ′ W Θ + α ′ P ̄ ′ W Θ − 3 2 + ξ − 1 2 ′ P ̄ ′ W Θ + O p ( T − 1 ) ,

where Θ − 3 2 = Θ ̂ − Θ . It can be easily verified that E ( μ ̂ 2 ) = α ′ P ̄ ′ W Θ + O ( T − 1 ) .

Let μ ̂ 3 = tr ( W Ψ ̂ ) be the third term of μ ̂ . Using the results in Appendix D of Kiviet and Phillips (2003), it readily follows that E ( μ ̂ 3 ) = tr ( W Ψ ) + O ( T − 1 ) . This completes the proof of (i). Similarly, using the results in Appendix D of Kiviet and Phillips (2003), (ii) can be easily verified. ■

Proof of Theorem 2

Note that z 2 , T + 1 S = 0 , y T , x T + 1 ′ ≡ z T + 1 , and z T + 1 = z ̄ T + 1 + z ̃ T + 1 , and similar to the arguments in Section 2, we write z ̄ T + 1 = E ( z T + 1 ) = 0 , y ̄ − 1 , T , x T ′ , y ̄ − 1 , T = ( f T , c T ) y ̄ − 1 , T * , where f T = λ 1 T 1 λ 2 T 2 , c T = λ 1 T 1 − 1 λ 2 T 2 , … , λ 1 λ 2 T 2 , λ 2 T 2 , λ 2 T 2 − 1 , … , λ 2 , 1 , y ̄ − 1 , T * = y ̄ 0 , x 1 ′ β 1 , … , x T 1 ′ β 1 , x T 1 + 1 ′ β 2 , … , x T ′ β 2 ′ , and z ̃ T + 1 = g ν e 1,2 ′ , with g = (df _T, c _T). Let W f = T z T + 1 ′ z T + 1 = W ̄ f + W ̃ f , where W ̄ f = T E z T + 1 ′ z T + 1 = T z ̄ T + 1 ′ z ̄ T + 1 + T E z ̃ T + 1 ′ z ̃ T + 1 = O p ( T ) , and W ̃ f = T z ̄ T + 1 ′ g ν e 1,2 ′ + T e 1,2 ν ′ g ′ z ̄ T + 1 + T ( ν ′ g ′ g ν − tr g ′ g Ω ν ) e 1,2 e 1,2 ′ = O p ( T ) . The one-step ahead MSFE of the Stein-like shrinkage estimator up to order O(T ⁻¹) is

(A.18) MSFE ( α ̆ 2 ) = E ( α ̆ − α ) ′ W f ( α ̆ − α ) = tr W ̄ f MSE ( α ̂ ) + tr E ( Γ − 1 W ̃ f ) + E Γ − 3 2 W ̃ f + E ( Γ − 2 W ̃ f ) = MSFE ( α ̂ 2 ) + τ 2 ϕ 2 α ′ P ̄ ′ W ̄ f P ̄ α − 2 τ ϕ tr ( W ̄ f P ̄ Q Σ ) + α ′ P ̄ ′ W ̄ f Θ + tr ( W ̄ f Ψ ) + 4 τ ϕ 2 α ′ P ̄ ′ W ̄ D P ̄ Q Σ W ̄ f P ̄ α + α ′ P ̄ ′ W ̄ D Ψ W ̄ f P ̄ α + 2 τ ϕ 2 α ′ P ̄ ′ Φ W ̄ f P ̄ α − 2 τ ϕ tr E W ̃ f ξ − 1 2 α ′ P ̄ ′ − 2 τ ϕ tr E ( W ̃ f ξ − 1 ) α ′ P ̄ ′ − 2 τ ϕ tr P ̄ E ξ − 1 2 ξ − 1 2 ′ W ̃ f − 2 τ ϕ tr E P ̃ − 1 2 α ξ − 1 2 ′ W ̃ f + 2 τ ϕ 2 tr E D 1 2 P ̄ α ξ − 1 2 ′ W ̃ f = MSFE ( α ̂ 2 ) + τ 2 ϕ 2 α ′ P ̄ ′ W ̄ f P ̄ α − 2 τ ϕ tr ( W ̄ f P ̄ Q Σ ) + α ′ P ̄ ′ W ̄ f Θ + tr ( W ̄ f Ψ ) + 4 τ ϕ 2 α ′ P ̄ ′ W ̄ D P ̄ Q Σ W ̄ f P ̄ α + α ′ P ̄ ′ W ̄ D Ψ W ̄ f P ̄ α + 2 τ ϕ 2 α ′ P ̄ ′ Φ W ̄ f P ̄ α − 2 τ ϕ tr ϒ 1 α ′ P ̄ ′ − 2 τ ϕ tr ϒ 3 α ′ P ̄ ′ − 2 τ ϕ tr P ̄ ϒ 2 − 2 τ ϕ tr ( P ̄ − I 2 ( k + 1 ) ) Q ϒ 4 + 2 τ ϕ 2 tr P ̄ α α ′ P ̄ ′ ϒ 4 + 2 P ̄ α α ′ P ̄ ′ W ̄ D P ̄ ϒ 2 + 2 P ̄ α α ′ P ̄ ′ W ̄ D ( P ̄ − I 2 ( k + 1 ) ) Q ϒ 4 .

Now, we give the expressions for the expectations of the above equation in the rest of the proof.

(A.19) E W ̃ f ξ − 1 2 = T E Z ̄ T + 1 ′ g ν e 1,2 ′ + e 1,2 ν ′ g ′ Z ̄ T + 1 + ν ′ g ′ g ν − tr g ′ g Ω ν e 1,2 e 1,2 ′ Q Z ̄ ′ ( 0 , I T ) ν + ∑ i = 1 2 ν ′ H i ν e 1 , i = T Z ̄ T + 1 ′ c T Ω u Z ̄ Q e 1,2 + e 1,2 tr Q Z ̄ ′ Ω u c T ′ Z ̄ T + 1 + 2 e 1,2 ′ Q e 1,2 e 1,2 tr c T ′ g Ω ν G ′ L 2 Ω u ≡ ϒ 1 .

(A.20) E ξ − 1 2 ξ − 1 2 ′ W ̃ f = T E Q Z ̄ ′ ( 0 , I T ) ν + ∑ i = 1 2 ν ′ H i ν e 1 , i ν ′ ( 0 , I T ) ′ Z ̄ + ∑ i = 1 2 ν ′ H i ν e 1 , i ′ Q Z ̄ T + 1 ′ g ν e 1,2 ′ + e 1,2 ν ′ g ′ Z ̄ T + 1 + ν ′ g ′ g ν − tr g ′ g Ω ν e 1,2 e 1,2 ′ = T 2 Q Z ̄ ′ Ω u c T ′ c T Ω u Z ̄ Q e 1,2 e 1,2 ′ + ∑ i = 1 2 Q Z ̄ ′ Ω u C ′ L i Ω u c T ′ + Ω u L i G Ω ν g ′ Z ̄ T + 1 Q e 1 , i e 1,2 ′ + e 1,2 ′ Q e 1,2 Q Z ̄ ′ Ω u C ′ L 2 Ω u c T ′ + Ω u L 2 G Ω ν g ′ Z ̄ T + 1 + ∑ i = 1 2 Q e 1 , i e 1,2 ′ tr Z ̄ Q Z ̄ T + 1 ′ g Ω ν G ′ L i Ω u + tr Z ̄ Q Z ̄ T + 1 ′ c T Ω u L i C Ω u + ∑ i = 1 2 Q e 1 , i e 1,2 ′ Q Z ̄ ′ Ω u C ′ L i Ω u c T ′ + Ω u L i G Ω ν g ′ Z ̄ T + 1 + 2 ∑ i = 1 2 e 1,2 ′ Q e 1,2 Q e 1 , i e 1,2 ′ tr g Ω ν G ′ L 2 Ω u C ′ L i Ω u c T ′ + tr c T Ω u L 2 G Ω ν G ′ L i Ω u c T ′ + tr g Ω ν G ′ L i Ω u C ′ L 2 Ω u c T ′ + tr g ′ g Ω ν G ′ L i Ω u L 2 G Ω ν ≡ ϒ 2 .

(A.21) E ( W ̃ f ξ − 1 ) = − T E Z ̄ T + 1 ′ g ν e 1,2 ′ + e 1,2 ν ′ g ′ Z ̄ T + 1 + ν ′ g ′ g ν − tr g ′ g Ω ν e 1,2 e 1,2 ′ Q ( A + B ) Z ̄ ′ ( 0 , I T ) ν + ∑ i = 1 2 ν ′ H i ν e 1 , i = − T ∑ i = 1 2 e 1 , i ′ Q e 1 , i Z ̄ T + 1 ′ g Ω ν G ′ L i Ω u C ′ + c T Ω u L i G Ω ν G ′ L i Z ̄ Q e 1,2 + ∑ i = 1 2 e 1 , i ′ Q e 1 , i e 1,2 tr Z ̄ T + 1 Q Z ̄ ′ L i G Ω ν G ′ L i Ω u c T ′ + tr Z ̄ T + 1 Q Z ̄ ′ L i C Ω u L i G Ω ν g ′ + ∑ i = 1 2 e 1,2 ′ Q e 1,2 Z ̄ T + 1 ′ g Ω ν G ′ L i Ω u C ′ + c T Ω u L i G Ω ν G ′ L 2 Z ̄ Q e 1 , i + ∑ i = 1 2 ∑ j = 1 2 e 1,2 e 1 , i ′ Q Z ̄ T + 1 ′ g Ω ν G ′ L j Ω u C ′ + c T Ω u L j G Ω ν G ′ L i Z ̄ Q e 1 , j + 2 ∑ i = 1 2 e 1,2 e 1,2 ′ Q Z ̄ ′ L i G Ω ν g ′ c T Ω u Z ̄ Q e 1 , i + 2 e 1,2 ′ Q e 1,2 e 1,2 tr c T ′ g Ω ν G ′ L 2 Z ̄ Q Z ̄ ′ Ω u + 2 e 1,2 ′ Q e 1,2 Z ̄ T + 1 ′ g Ω ν G ′ L 2 C Ω u Z ̄ Q e 1,2 + 2 ∑ i = 1 2 e 1,2 e 1 , i ′ Q Z ̄ ′ Ω u C ′ L i G Ω ν g ′ Z ̄ T + 1 Q e 1 , i + 4 e 1,2 ′ Q e 1,2 2 e 1,2 tr c T ′ g Ω ν G ′ L 2 G Ω ν G ′ L 2 Ω u + tr g ′ g Ω ν G ′ L 2 C Ω u L 2 G Ω ν ≡ ϒ 3 .

(A.22) E ( A + B ) Q − 1 P ̄ α ξ − 1 2 ′ W ̃ f = T E ( A + B ) Q − 1 P ̄ α ν ′ ( 0 , I T ) ′ Z ̄ + ∑ i = 1 2 ν ′ H i ν e 1 , i ′ Q Z ̄ T + 1 ′ g ν e 1,2 ′ + e 1,2 ν ′ g ′ Z ̄ T + 1 + ν ′ g ′ g ν − tr g ′ g Ω ν e 1,2 e 1,2 ′ = T 2 ∑ i = 1 2 e 1 , i ′ P ̄ α Z ̄ ′ L i G Ω ν g ′ c T Ω u Z ̄ Q e 1,2 e 1,2 ′ + 2 ∑ i = 1 2 e 1 , i α ′ P ̄ ′ Z ̄ ′ L i G Ω ν g ′ c T Ω u Z ̄ Q e 1,2 e 1,2 ′ + ∑ i = 1 2 ∑ j = 1 2 e 1 , i ′ P ̄ α Z ̄ ′ L i G Ω ν G ′ L j Ω u c T ′ + C Ω u L j G Ω ν g ′ Z ̄ T + 1 Q e 1 , j e 1,2 ′ + ∑ i = 1 2 ∑ j = 1 2 e 1 , i ′ P ̄ α ( e 1 , j ′ Q e 1,2 ) Z ̄ ′ L i G Ω ν G ′ L j Ω u c T ′ + C Ω u L j G Ω ν g ′ Z ̄ T + 1 + ∑ i = 1 2 ∑ j = 1 2 e 1 , i e 1,2 ′ e 1 , j ′ Q Z ̄ T + 1 ′ g Ω ν G ′ L j Ω u C ′ + c T Ω u L j G Ω ν G ′ L i Z ̄ P ̄ α + ∑ i = 1 2 e 1,2 ′ Q e 1,2 e 1 , i α ′ P ̄ ′ Z ̄ ′ L i G Ω ν G ′ L 2 Ω u c T ′ + C Ω u L 2 G Ω ν g ′ Z ̄ T + 1 + 2 ∑ i = 1 2 e 1 , i ′ P ̄ α e 1 , i e 1,2 ′ tr G ′ L i C Ω u Z ̄ Q Z ̄ T + 1 ′ g Ω ν + 2 ∑ i = 1 2 e 1 , i ′ P ̄ α e 1 , i e 1,2 ′ Q Z ̄ ′ Ω u C ′ L i G Ω ν g ′ Z ̄ T + 1 + 4 ∑ i = 1 2 e 1 , i ′ P ̄ α ( e 1,2 ′ Q e 1,2 ) e 1 , i e 1,2 ′ tr c T ′ g Ω ν G ′ L i G Ω ν G ′ L 2 Ω u + tr g ′ g Ω ν G ′ L i C Ω u L 2 G Ω ν ≡ ϒ 4 .

This completes the proof of Theorem 2. ■

Proof of Corollary 2.1

The proof follows readily from the proof of Corollary 1.1, so it is omitted. ■

Proof of Theorem 4

From the proof of Lemma C.2 in the Supplementary Online Appendix, we have

(A.23) α ̇ ̂ − α ̇ = ( Z ̇ ′ Z ̇ ) − 1 Z ̇ ′ u = ξ ̇ − 1 2 + ξ ̇ − 1 + ξ ̇ − 3 2 + O p ( T − 2 ) ,

where ξ ̇ − 1 2 , ξ ̇ − 1 , and ξ ̇ − 3 2 are defined below, and the suffixes show the order of magnitude in probability,

ξ ̇ − 1 2 = Q ̇ Z ̇ ̄ ′ ( 0 , I T ) ν = O p T − 1 2 , ξ ̇ − 1 = Q ̇ ∑ i = 1 2 ν ′ H ̇ i ν N e 1 , i − Q ̇ A ̇ Z ̇ ̄ ′ ( 0 , I T ) ν = O p ( T − 1 ) , ξ ̇ − 3 2 = − Q ̇ A ̇ ∑ i = 1 2 ν ′ H ̇ i ν N e 1 , i − Q ̇ B ̇ Z ̇ ̄ ′ ( 0 , I T ) ν + Q ̇ A ̇ 2 Z ̇ ̄ ′ ( 0 , I T ) ν = O p T − 3 2 ,

where H ̇ = G ̇ ′ L i ( 0 , I T ) . Also, let W ̇ D ≡ Z ̇ ′ Z ̇ = W ̇ ̄ D + W ̇ ̃ D + Z ̇ ̃ ′ Z ̇ ̃ , where W ̇ ̄ D = E ( Z ̇ ) ′ E ( Z ̇ ) = Z ̇ ̄ ′ Z ̇ ̄ = O p ( T ) , W ̇ ̃ D = Z ̇ ̄ ′ Z ̇ ̃ + Z ̇ ̃ ′ Z ̇ ̄ = A ̇ Q ̇ − 1 = O p ( T 1 2 ) , and Z ̇ ̃ ′ Z ̇ ̃ = O p ( 1 ) , with A ̇ defined in Lemma C.1 in the Supplementary Online Appendix.

Using Equations (A.23) in (3.6), we have

(A.24) 1 D ( α ̇ ̂ , α ̇ ̃ ) = ( α ̇ ̂ − α ̇ ̃ ) ′ W ̇ D ( α ̇ ̂ − α ̇ ̃ ) − 1 = α ̇ ̂ ′ P ̇ ′ W ̇ D P ̇ α ̇ ̂ − 1 = 1 ϕ ̇ 1 − 1 ϕ ̇ ϕ ̇ ̃ 1 2 − 2 ϕ ̇ α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D P ̇ ̄ ξ ̇ − 1 2 − 2 ϕ ̇ α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D P ̇ ̃ − 1 2 α ̇ + O p ( T − 2 ) ≡ 1 ϕ ̇ ︸ O p ( T − 1 ) − 1 ϕ ̇ 2 D ̇ 1 2 ︸ O p T − 3 2 + O p ( T − 2 ) ,

where D ̇ 1 2 = ϕ ̇ ̃ 1 2 + 2 α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D P ̇ ̄ ξ ̇ − 1 2 + α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D P ̇ ̃ − 1 2 α ̇ = O p ( T 1 2 ) , ϕ ̇ = α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D P ̇ ̄ α ̇ = O ( T ) , ϕ ̇ ̃ 1 2 = α ̇ ′ P ̇ ̄ ′ W ̇ ̃ D P ̇ ̄ α ̇ = O ( T 1 / 2 ) , and the last equality above holds by using the standard geometric expansion. The terms with order O _p(T ⁻²) and smaller are dropped, because they will not enter in the calculation of the bias and MSE of the Stein-like shrinkage estimator up to the orders of interest.

Employing Equations (A.23) and (A.24) in (4.16), we obtain

(A.25) α ̇ ̆ − α ̇ = ( α ̇ ̂ − α ̇ ) − τ 1 ϕ ̇ − 1 ϕ ̇ 2 D ̇ 1 2 + O p ( T − 2 ) P ̇ ̄ + P ̇ ̃ − 1 2 + O p ( T − 1 ) α ̇ ̂ = ζ ̇ − 1 2 + ζ ̇ − 1 + ζ ̇ − 3 2 + O p ( T − 2 ) ,

where ζ ̇ − 1 2 , ζ ̇ − 1 and ζ ̇ − 3 2 are defined below

ζ ̇ − 1 2 = ξ ̇ − 1 2 = O p T − 1 2 , ζ ̇ − 1 = ξ ̇ − 1 − τ ϕ ̇ P ̇ ̄ α ̇ = O p ( T − 1 ) , ζ ̇ − 3 2 = ξ ̇ − 3 2 − τ ϕ ̇ P ̇ ̄ ξ ̇ − 1 2 − τ ϕ ̇ P ̇ ̃ − 1 2 α ̇ + τ ϕ ̇ 2 D ̇ 1 2 P ̇ ̄ α ̇ = O p T − 3 2 .

The bias of the Stein-like shrinkage estimator using the approximations in Equation (A.25) up to order O(T ⁻¹) is

(A.26) E ( α ̇ ̆ − α ̇ ) = E ζ ̇ − 1 2 + ζ ̇ − 1 = E ξ ̇ − 1 2 + E ( ξ ̇ − 1 ) − τ ϕ ̇ P ̇ ̄ α ̇ = Θ ̇ − τ ϕ ̇ P ̇ ̄ α ̇ ,

where the last equality holds because of (C.8) and (C.9) in Lemma C.2 of the Supplementary Online Appendix.

The MSE of the Stein-like shrinkage estimator up to order O(T ⁻²) is

(A.27) E ( α ̇ ̆ − α ̇ ) ( α ̇ ̆ − α ̇ ) ′ = E Γ ̇ − 1 + Γ ̇ − 3 2 + Γ ̇ − 2 ,

where Γ ̇ − 1 , Γ ̇ − 3 2 and Γ ̇ − 2 are

Γ ̇ − 1 = ζ ̇ − 1 2 ζ ̇ − 1 2 ′ , Γ ̇ − 3 2 = ζ ̇ − 1 2 ζ ̇ − 1 ′ + ζ ̇ − 1 ζ ̇ − 1 2 ′ , Γ ̇ − 2 = ζ ̇ − 1 2 ζ ̇ − 3 2 ′ + ζ ̇ − 3 2 ζ ̇ − 1 2 ′ + ζ ̇ − 1 ζ ̇ − 1 ′ ,

and we derive their expectations in the rest of the proof using Lemmas B.1 and D.2.

(A.28) E ( Γ ̇ − 1 ) = E ζ ̇ − 1 2 ζ ̇ − 1 2 ′ = E ξ ̇ − 1 2 ξ ̇ − 1 2 ′ = Q ̇ Σ ,

where the last equality holds because of (D.12) in Lemma D.2.

(A.29) E Γ ̇ − 3 2 = E ζ ̇ − 1 2 ζ ̇ − 1 ′ + E ζ ̇ − 1 ζ ̇ − 1 2 ′ = E ξ ̇ − 1 ξ ̇ − 1 2 ′ + E ξ ̇ − 1 2 ξ ̇ − 1 ′ ,

because

(A.30) E ζ ̇ − 1 ζ ̇ − 1 2 ′ = E ξ ̇ − 1 − τ ϕ ̇ P ̇ ̄ α ̇ ξ ̇ − 1 2 ′ = E ξ ̇ − 1 ξ ̇ − 1 2 ′ − τ ϕ ̇ P ̇ ̄ α ̇ E ξ ̇ − 1 2 ′ = E ξ ̇ − 1 ξ ̇ − 1 2 ′ ,

where the last equality holds by (D.8), and E ξ ̇ − 1 ξ ̇ − 1 2 ′ is given in (D.12). Also, we have

(A.31) E ( Γ ̇ − 2 ) = E ζ ̇ − 1 2 ζ ̇ − 3 2 ′ + E ζ ̇ − 3 2 ζ ̇ − 1 2 ′ + E ζ ̇ − 1 ζ ̇ − 1 ′ = E ξ ̇ − 3 2 ξ ̇ − 1 2 ′ + E ξ ̇ − 1 2 ξ ̇ − 3 2 ′ + E ξ ̇ − 1 ξ ̇ − 1 ′ + τ 2 ϕ ̇ 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ − τ ϕ ̇ Θ ̇ α ̇ ′ P ̇ ̄ ′ + P ̇ ̄ α ̇ Θ ̇ ′ − τ ϕ ̇ P ̇ ̄ Q ̇ Σ + Σ Q ̇ P ̇ ̄ ′ + E P ̇ ̃ − 1 2 α ̇ ξ ̇ − 1 2 ′ + E P ̇ ̃ − 1 2 α ̇ ξ ̇ − 1 2 ′ ′ + 2 τ ϕ ̇ 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D P ̇ ̄ Q ̇ Σ + Σ Q ̇ P ̇ ̄ ′ W ̇ ̄ D P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ + P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D E P ̇ ̃ − 1 2 α ̇ ξ ̇ − 1 2 ′ + E P ̇ ̃ − 1 2 α ̇ ξ ̇ − 1 2 ′ ′ W ̇ ̄ D P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ + τ ϕ ̇ 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ E W ̇ ̃ D P ̇ ̄ α ̇ ξ ̇ − 1 2 ′ + E ξ ̇ − 1 2 α ̇ ′ P ̇ ̄ ′ W ̇ ̃ D P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ ,

where

(A.32) E ζ ̇ − 1 ζ ̇ − 1 ′ = E ξ ̇ − 1 ξ ̇ − 1 ′ − τ ϕ ̇ E ξ ̇ − 1 α ̇ ′ P ̇ ̄ ′ − τ ϕ ̇ E P ̇ ̄ α ̇ ξ ̇ − 1 ′ + τ 2 ϕ ̇ 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ = E ξ ̇ − 1 ξ ̇ − 1 ′ − τ ϕ ̇ E ( ξ ̇ − 1 ) α ̇ ′ P ̇ ̄ ′ + P ̇ ̄ α ̇ E ξ ̇ − 1 ′ + τ 2 ϕ ̇ 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ ,

E ( ξ ̇ − 1 ) = Θ ̇ , by (D.9), and E ξ ̇ − 1 ξ ̇ − 1 ′ is given in (D.13). Also,

(A.33) E ζ ̇ − 3 2 ζ ̇ − 1 2 ′ = E ξ ̇ − 3 2 ξ ̇ − 1 2 ′ − τ ϕ ̇ P ̇ ̄ E ξ ̇ − 1 2 ξ ̇ − 1 2 ′ − τ ϕ ̇ E P ̇ ̃ − 1 2 α ̇ ξ ̇ − 1 2 ′ + τ ϕ ̇ 2 E D ̇ 1 2 P ̇ ̄ α ̇ ξ ̇ − 1 2 ′ = E ξ ̇ − 3 2 ξ ̇ − 1 2 ′ − τ ϕ ̇ P ̇ ̄ Q ̇ Σ + E P ̇ ̃ − 1 2 α ̇ ξ ̇ − 1 2 ′ + 2 τ ϕ ̇ 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D P ̇ ̄ Q ̇ Σ + E P ̇ ̃ − 1 2 α ̇ ξ ̇ − 1 2 ′ + τ ϕ ̇ 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ E W ̇ ̃ D P ̇ ̄ α ̇ ξ ̇ − 1 2 ′ ,

where E ξ ̇ − 3 2 ξ ̇ − 1 2 ′ is given in (D.14), and the last equality above holds by using

(A.34) E D ̇ 1 2 P ̇ ̄ α ̇ ξ ̇ − 1 2 ′ = 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D E P ̇ ̄ ξ ̇ − 1 2 + P ̇ ̃ − 1 2 α ̇ ξ ̇ − 1 2 ′ + P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ E W ̇ ̃ D P ̇ ̄ α ̇ ξ ̇ − 1 2 ′ = 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D P ̇ ̄ Q ̇ Σ + 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D E P ̇ ̃ − 1 2 α ̇ ξ ̇ − 1 2 ′ + P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ E W ̇ ̃ D P ̇ ̄ α ̇ ξ ̇ − 1 2 ′ ,

and using Equation (D.11). Moreover, we find that

(A.35) E P ̇ ̃ − 1 2 α ̇ ξ ̇ − 1 2 ′ = E P ̇ ̄ − I Q ̇ A ̇ Q ̇ − 1 P ̇ ̄ α ̇ ν ′ ( 0 , I T ) ′ Z ̇ ̄ Q ̇ = P ̇ ̄ − I Q ̇ ∑ i = 1 2 e 1 , i ′ N P ̇ ̄ α ̇ Z ̇ ̄ ′ L i C ̇ Ω u Z ̇ ̄ Q ̇ + ∑ i = 1 2 N e 1 , i α ̇ ′ P ̇ ̄ ′ Z ̇ ̄ ′ L i C ̇ Ω u Z ̇ ̄ Q ̇ ≡ Ψ ̇ ,

and

(A.36) E W ̇ ̃ D P ̇ ̄ α ̇ ξ ̇ − 1 2 ′ = E ∑ l = 1 2 Z ̇ ̄ l ′ L l G ̇ ν e 1 , l ′ N + N e 1 , l ν ′ G ̇ ′ L l Z ̇ ̄ l P ̇ ̄ α ̇ ν ′ ( 0 , I T ) ′ Z ̇ ̄ + ∑ i = 1 2 ν ′ H ̇ i ν e 1 , i ′ N Q ̇ = ∑ l = 1 2 e 1 , l ′ N P ̇ ̄ α ̇ Z ̇ ̄ ′ L l C ̇ Ω u Z ̇ ̄ Q ̇ + N e 1 , l α ̇ ′ P ̇ ̄ ′ Z ̇ ̄ ′ L l C ̇ Ω u Z ̇ ̄ Q ̇ ≡ Φ ̇ .

By employing the results of Equations (A.28), (A.29) and (A.31), in (A.27), we obtain the MSE of the Stein-like shrinkage estimator up to order O(T ⁻²), as

(A.37) MSE ( α ̇ ̆ ) = MSE ( α ̇ ̂ ) + τ 2 ϕ ̇ 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ − τ ϕ ̇ P ̇ ̄ Q ̇ Σ + Σ Q ̇ P ̇ ̄ ′ + Θ ̇ α ̇ ′ P ̇ ′ + P ̇ ̄ α ̇ Θ ̇ ′ + Ψ ̇ + Ψ ̇ ′ + 2 τ ϕ ̇ 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D P ̇ ̄ Q ̇ Σ + Ψ ̇ + Σ Q ̇ P ̇ ̄ ′ + Ψ ̇ ′ W ̇ ̄ D P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ + τ ϕ ̇ 2 P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ Φ ̇ + Φ ̇ ′ P ̇ ̄ α ̇ α ̇ ′ P ̇ ̄ ′ .

Further, the risk of the Stein-like shrinkage estimator up to order O(T ⁻¹), can be written as

(A.38) Risk ( α ̇ ̆ ) = E ( α ̇ ̆ − α ̇ ) ′ W ( α ̇ ̆ − α ̇ ) = tr W E ( α ̇ ̆ − α ̇ ) ( α ̇ ̆ − α ̇ ) ′ = tr W MSE ( α ̇ ̆ ) = Risk ( α ̇ ̂ ) + τ 2 ϕ ̇ 2 α ̇ ′ P ̇ ̄ ′ W P ̇ ̄ α ̇ − 2 τ ϕ ̇ tr ( W P ̇ ̄ Q ̇ Σ ) + α ̇ ′ P ̇ ̄ ′ W Θ ̇ + tr ( W Ψ ̇ ) + 4 τ ϕ ̇ 2 α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D P ̇ ̄ Q ̇ Σ W P ̇ ̄ α ̇ + α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D Ψ ̇ W P ̇ ̄ α ̇ + 2 τ ϕ ̇ 2 α ̇ ′ P ̇ ̄ ′ Φ ̇ W P ̇ ̄ α ̇ .

This completes the proof of Theorem 4. ■

Proof of Theorem 5

Let z ̇ T + 1 = z T + 1 N , similar to the arguments in the proof of Lemma D.1, we write z ̇ T + 1 = z ̇ ̄ T + 1 + z ̇ ̃ T + 1 where z ̇ ̄ T + 1 = E ( z ̇ T + 1 ) = E ( z T + 1 ) N , and z ̇ ̃ T + 1 = ( z ̇ T + 1 − Z ̇ ̄ T + 1 ) = g ̇ ν e 1,2 ′ N , with g ̇ defined as g after replacing λ ₁ = 1 and λ ₂ = 1. Let W ̇ f = T z ̇ T + 1 ′ z ̇ T + 1 = W ̇ ̄ f + W ̇ ̃ f , where W ̇ ̄ f = T E ( z ̇ T + 1 ) ′ E ( z ̇ T + 1 ) = T z ̇ ̄ T + 1 ′ z ̇ ̄ T + 1 = O p ( T ) , and W ̇ ̃ f = T z ̇ ̄ T + 1 ′ g ̇ ν e 1,2 ′ N + T N e 1,2 ν ′ g ̇ ′ z ̇ ̄ T + 1 + T ( ν ′ g ̇ ′ g ̇ ν − tr g ̇ ′ g ̇ Ω ν ) N e 1,2 e 1,2 ′ N = O p ( T 1 / 2 ) + O p ( 1 ) .

The one-step ahead MSFE of the Stein-like shrinkage estimator up to order O(T ⁻¹) is

(A.39) MSFE ( α ̇ ̆ 2 ) = E ( α ̇ ̆ − α ̇ ) ′ W ̇ f ( α ̇ ̆ − α ̇ ) = tr W ̇ ̄ f MSE ( α ̇ ̂ ) + tr E ( Γ ̇ − 1 W ̇ ̃ f ) + E Γ ̇ − 3 2 W ̇ ̃ f + E ( Γ ̇ − 2 W ̇ ̃ f ) = MSFE ( α ̇ ̂ 2 ) + τ 2 ϕ ̇ 2 α ̇ ′ P ̇ ̄ ′ W ̇ ̄ f P ̇ ̄ α ̇ − 2 τ ϕ ̇ tr ( W ̇ ̄ f P ̇ ̄ Q ̇ Σ ) + α ̇ ′ P ̇ ̄ ′ W ̇ ̄ f Θ ̇ + tr ( W ̇ ̄ f Ψ ̇ ) + 4 τ ϕ ̇ 2 α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D P ̇ ̄ Q ̇ Σ W ̇ ̄ f P ̇ ̄ α ̇ + α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D Ψ ̇ W ̇ ̄ f P ̇ ̄ α ̇ + 2 τ ϕ ̇ 2 α ̇ ′ P ̇ ̄ ′ Φ ̇ W ̇ ̄ f P ̇ ̄ α ̇ − 2 τ ϕ ̇ tr E W ̇ ̃ f ξ ̇ − 1 2 α ̇ ′ P ̇ ̄ ′ = MSFE ( α ̇ ̂ 2 ) + τ 2 ϕ ̇ 2 α ̇ ′ P ̇ ̄ ′ W ̇ ̄ f P ̇ ̄ α ̇ − 2 τ ϕ ̇ tr ( W ̇ ̄ f P ̇ ̄ Q ̇ Σ ) + α ̇ ′ P ̇ ̄ ′ W ̇ ̄ f Θ ̇ + tr ( W ̇ ̄ f Ψ ̇ ) + 4 τ ϕ ̇ 2 α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D P ̇ ̄ Q ̇ Σ W ̇ ̄ f P ̇ ̄ α ̇ + α ̇ ′ P ̇ ̄ ′ W ̇ ̄ D Ψ ̇ W ̇ ̄ f P ̇ ̄ α ̇ + 2 τ ϕ ̇ 2 α ̇ ′ P ̇ ̄ ′ Φ ̇ W ̇ ̄ f P ̇ ̄ α ̇ − 2 τ ϕ ̇ tr ϒ ̇ α ̇ ′ P ̇ ̄ ′ ,

where

(A.40) ϒ ̇ ≡ E W ̇ ̃ f ξ ̇ − 1 2 = T E Z ̇ ̄ T + 1 ′ g ̇ ν e 1,2 ′ N + N e 1,2 ν ′ g ̇ ′ Z ̇ ̄ T + 1 Q ̇ Z ̇ ̄ ′ ( 0 , I T ) ν + o p ( 1 ) = T Z ̇ ̄ T + 1 ′ c ̇ T Ω u Z ̇ ̄ Q ̇ N e 1,2 + N e 1,2 tr Q ̇ Z ̇ ̄ ′ Ω u c ̇ T ′ Z ̇ ̄ T + 1 + o p ( 1 ) .

This completes the proof of Theorem 5. ■

References

Bai, J., and S. Ng. 2002. “Determining the Number of Factors in Approximate Factor Models.” Econometrica 70: 191–221. https://doi.org/10.1111/1468-0262.00273.Search in Google Scholar

Bai, J., and P. Perron. 1998. “Estimating and Testing Linear Models with Multiple Structural Changes.” Econometrica 66: 47–78. https://doi.org/10.2307/2998540.Search in Google Scholar

Bai, J., and P. Perron. 2003. “Computation and Analysis of Multiple Structural Change Models.” Journal of Applied Econometrics 18: 1–22. https://doi.org/10.1002/jae.659.Search in Google Scholar

Banerjee, A., and G. Urga. 2006. “Modelling Structural Breaks, Long Memory and Stock Market Volatility: An Overview.” Journal of Econometrics 129: 1–34. https://doi.org/10.1016/j.jeconom.2004.09.001.Search in Google Scholar

Bartlett, M. S. 1946. “On the Theoretical Specification and Sampling Properties of Autocorrelated Time-Series.” Journal of the Royal Statistical Society 8: 27–41. https://doi.org/10.2307/2983611.Search in Google Scholar

Clark, T. E., and M. W. McCracken. 2010. “Averaging Forecasts from VARs with Uncertain Instabilities.” Journal of Applied Econometrics 25: 5–29. https://doi.org/10.1002/jae.1127.Search in Google Scholar

Clark, T. E., and M. K. D. West. 2007. “Approximately Normal Tests for Equal Predictive Accuracy in Nested Models.” Journal of Econometrics 138: 291–311. https://doi.org/10.1016/j.jeconom.2006.05.023.Search in Google Scholar

Clemen, R. T. 1989. “Combining Forecasts: A Review and Annotated Bibliography.” International Journal of Forecasting 5: 559–81. https://doi.org/10.1016/0169-2070(89)90012-5.Search in Google Scholar

Clements, M. P., and D. F. Hendry. 1998. Forecasting Economic Time Series. Cambridge, England: Cambridge University Press.Search in Google Scholar

Clements, M. P., and D. F. Hendry. 1999. Forecasting Non-Stationary Economic Time Series. Cambridge: The MIT Press.10.1017/CBO9780511599286Search in Google Scholar

Clements, M. P., and D. F. Hendry. 2006. “Forecasting with Breaks.” In Handbook of Economic Forecasting, 1, edited by G. Elliott, C. W. J. Granger, and A. Timmermann, 605–58. North-Holland: Elsevier Science.10.1016/S1574-0706(05)01012-8Search in Google Scholar

Clements, M. P., and D. F. Hendry. 2011. The Oxford Handbook of Economic Forecasting. Oxford: Oxford University Press.10.1093/oxfordhb/9780195398649.001.0001Search in Google Scholar

Diebold, F. X., and R. S. Mariano. 1995. “Comparing Predictive Accuracy.” Journal of Business & Economic Statistics 13: 253–63. https://doi.org/10.2307/1392185.Search in Google Scholar

Garcia, R., and P. Perron. 1996. “An Analysis of the Real Interest Rate Under Regime Shifts.” The Review of Economics and Statistics 78: 111–25. https://doi.org/10.2307/2109851.Search in Google Scholar

Giacomini, R., and B. Rossi. 2009. “Detecting and Predicting Forecast Breakdowns.” The Review of Economic Studies 76: 669–705. https://doi.org/10.1111/j.1467-937x.2009.00545.x.Search in Google Scholar

Gourieroux, C., and A. Monfort. 1997. Time Series and Dynamic Models. Cambridge: Cambridge University Press.10.1017/CBO9780511628597Search in Google Scholar

Grubb, D., and J. Symons. 1987. “Bias in Regressions with a Lagged Dependent Variable.” Econometric Theory 3: 371–86. https://doi.org/10.1017/s0266466600010458.Search in Google Scholar

Hansen, B. E. 2001. “The New Econometrics of Structural Change: Dating Breaks in U.S. Labor Productivity.” The Journal of Economic Perspectives 15: 117–28. https://doi.org/10.1257/jep.15.4.117.Search in Google Scholar

Hansen, B. E. 2009. “Averaging Estimators for Regressions with a Possible Structural Break.” Econometric Theory 25: 1498–514. https://doi.org/10.1017/s0266466609990235.Search in Google Scholar

Hansen, B. E. 2016. “Efficient Shrinkage in Parametric Models.” Journal of Econometrics 190: 115–32. https://doi.org/10.1016/j.jeconom.2015.09.003.Search in Google Scholar

Hansen, B. E. 2017. “Stein-Like 2SLS Estimator.” Econometric Reviews 36: 840–52. https://doi.org/10.1080/07474938.2017.1307579.Search in Google Scholar

Hurwicz, L. 1950. “Least Squares Bias in Time Series.” In Statistical Inference in Dynamic Economic Models, edited by T. Koopmans, 365–83. New York: Wiley.Search in Google Scholar

Inoue, A., and B. Rossi. 2011. “Identifying the Sources of Instabilities in Macroeconomic Fluctuations.” The Review of Economics and Statistics 164: 158–72.10.1162/REST_a_00130Search in Google Scholar

Inoue, A., L. Jin, and B. Rossi. 2017. “Rolling Window Selection for Out-of-Sample Forecasting with Time-Varying Parameters.” Journal of Econometrics 196: 55–67. https://doi.org/10.1016/j.jeconom.2016.03.006.Search in Google Scholar

James, W., and C. M. Stein. 1961. “Estimation with Quadratic Loss.” In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, 361–80.Search in Google Scholar

Kendall, M. G. 1954. “Note on the Bias in the Estimation of Autocorrelation.” Biometrika 61: 403–4.10.1093/biomet/41.3-4.403Search in Google Scholar

Kiviet, J. F., and G. D. A. Phillips. 1993. “Alternative Bias Approximations in Regressions with a Lagged Dependent Variable.” Econometric Theory 9: 62–80. https://doi.org/10.1017/s0266466600007337.Search in Google Scholar

Kiviet, J. F., and G. D. A. Phillips. 1994. “Bias Assessment and Reduction in Linear Error-Correction Models.” Journal of Econometrics 63: 215–43. https://doi.org/10.1016/0304-4076(93)01566-5.Search in Google Scholar

Kiviet, J. F., and G. D. A. Phillips. 2003. “Improved Coefficient and Variance Estimation in Stable First-Order Dynamic Regression Models.” Working paper.Search in Google Scholar

Kiviet, J. F., and G. D. A. Phillips. 2005. “Moment Approximation for Least-Squares Estimators in Dynamic Regression Models with a Unit Root.” The Econometrics Journal 8: 115–42. https://doi.org/10.1111/j.1368-423x.2005.00156.x.Search in Google Scholar

Kiviet, J. F., and G. D. A. Phillips. 2012. “Higher-Order Asymptotic Expansions of the Least-Squares Estimation Bias in First-Order Dynamic Regression Models.” Computational Statistics & Data Analysis 56: 3705–29. https://doi.org/10.1016/j.csda.2010.07.013.Search in Google Scholar

Lee, T., S. Parsaeian, and A. Ullah. 2022a. “Efficient Combined Estimation Under Structural Breaks.” In Advances in Econometrics, Vol. 43, edited by A. Chudik, C. Hsiao, and A. Timmermann. Leeds: Emerald Publishing Limited.10.1108/S0731-90532021000043A007Search in Google Scholar

Lee, T., S. Parsaeian, and A. Ullah. 2022b. “Optimal Forecast under Structural Breaks.” Journal of Applied Econometrics 37: 965–87. https://doi.org/10.1002/jae.2908.Search in Google Scholar

Lee, T., S. Parsaeian, and A. Ullah. 2022c. “Forecasting under Structural Breaks Using Improved Weighted Estimation.” Oxford Bulletin of Economics & Statistics 84: 1485–501. https://doi.org/10.1111/obes.12512.Search in Google Scholar

Maasoumi, E. 1978. “A Modified Stein-like Estimator for the Reduced Form Coefficients of Simultaneous Equations.” Econometrica 46: 695–703. https://doi.org/10.2307/1914241.Search in Google Scholar

Mankiw, N. G., and J. A. Miron. 1986. “The Changing Behavior of the Term Structure of Interest Rates.” Quarterly Journal of Economics 101: 211–28. https://doi.org/10.2307/1891113.Search in Google Scholar

Mankiw, N. G., J. A. Miron, and D. N. Weil. 1987. “The Adjustment of Expectations to a Change in Regime: A Study of the Founding of the Federal Reserve.” The American Economic Review 77: 358–74.10.3386/w2124Search in Google Scholar

Marriott, F. H. C., and J. A. Pope. 1954. “Bias in the Estimation of Autocorrelations.” Biometrika 61: 393–403.Search in Google Scholar

McCracken, M. W., and S. Ng. 2016. “FRED-MD: A Monthly Database for Macroeconomic Research.” Journal of Business & Economic Statistics 34: 574–89. https://doi.org/10.1080/07350015.2015.1086655.Search in Google Scholar

Mehrabani, A., and A. Ullah. 2020. “Improved Average Estimation in Seemingly Unrelated Regressions.” Econometrics 8: 15. https://doi.org/10.3390/econometrics8020015.Search in Google Scholar

Nagar, A. L. 1959. “The Bias and Moment Matrix of the General K-Class Estimators of the Parameters in Simultaneous Equations.” Econometrica 27: 575–95. https://doi.org/10.2307/1909352.Search in Google Scholar

Newbold, P., and D. I. Harvey. 2002. “Forecast Combination and Encompassing.” In A Companion to Economic Forecasting, edited by M. P. Clements, and D. F. Hendry. Oxford: Blackwells.Search in Google Scholar

Pesaran, M. H., and A. Pick. 2011. “Forecast Combination Across Estimation Windows.” Journal of Business & Economic Statistics 29: 307–18. https://doi.org/10.1198/jbes.2010.09018.Search in Google Scholar

Pesaran, M. H., and A. Timmermann. 2002. “Market Timing and Return Prediction Under Model Instability.” Journal of Empirical Finance 9: 495–510. https://doi.org/10.1016/s0927-5398(02)00007-5.Search in Google Scholar

Pesaran, M. H., and A. Timmermann. 2005. “Small Sample Properties of Forecasts from Auto-Regressive Models Under Structural Breaks.” Journal of Econometrics 129: 183–217. https://doi.org/10.1016/j.jeconom.2004.09.007.Search in Google Scholar

Pesaran, M. H., and A. Timmermann. 2007. “Selection of Estimation Window in the Presence of Breaks.” Journal of Econometrics 137: 134–61. https://doi.org/10.1016/j.jeconom.2006.03.010.Search in Google Scholar

Pesaran, M. H., A. Pick, and M. Pranovich. 2013. “Optimal Forecasts in the Presence of Structural Breaks.” Journal of Econometrics 177: 134–52. https://doi.org/10.1016/j.jeconom.2013.04.002.Search in Google Scholar

Phillips, P. C. B., Y. Wu, and J. Yu. 2011. “Explosive Behavior in the 1990s Nasdaq: When Did Exuberance Escalate Asset Values?” International Economic Review 52: 201–26. https://doi.org/10.1111/j.1468-2354.2010.00625.x.Search in Google Scholar

Rossi, B. 2013. “Advances in Forecasting under Instability.” In Handbook of Economic Forecasting, Vol. 2, Part B Elliott, and A. Timmermann, 1203–324. Amsterdam: Elsevier.10.1016/B978-0-444-62731-5.00021-XSearch in Google Scholar

Rossi, B., and T. Sekhposyan. 2010. “Have Economic Models’ Forecasting Performance for US Output Growth and Inflation Changed over Time, and When?” International Journal of Forecasting 26: 808–35. https://doi.org/10.1016/j.ijforecast.2009.08.004.Search in Google Scholar

Shenton, L. R., and W. L. Johnson. 1965. “Moments of a Serial Correlation Coefficient.” Journal of the Royal Statistical Society: Series B 27: 308–18. https://doi.org/10.1111/j.2517-6161.1965.tb01498.x.Search in Google Scholar

Smeekes, S., and E. Wijler. 2020. “Unit Roots and Cointegration.” In Macroeconomic Forecasting in the Era of Big Data (Advanced Studies in Theoretical and Applied Econometrics), 52, edited by P. Fuleky. New York: Springer.10.1007/978-3-030-31150-6_17Search in Google Scholar

Stein, Charles M. 1956. “Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution.” In Proceedings of the Third Berkeley Symposium on Math. Statist. and Probability, Vol. 1, 197–206.10.1525/9780520313880-018Search in Google Scholar

Stock, J. H., and M. W. Watson. 1996. “Evidence on Structural Instability in Macroeconomic Time Series Relations.” Journal of Business & Economic Statistics 14: 11–30. https://doi.org/10.2307/1392096.Search in Google Scholar

Stock, J. H., and M. W. Watson. 2004. “Combination Forecasts of Output Growth in a Seven-Country Data Set.” Journal of Forecasting 23: 405–30. https://doi.org/10.1002/for.928.Search in Google Scholar

Timmerman, Allan. 2006. “Forecast Combinations.” In Handbook of Economic Forecasting, edited by G. Elliott, C. W. Granger, and A. Timmermann, 135–96. Amsterdam: Elsevier.10.1016/S1574-0706(05)01004-9Search in Google Scholar

White, J. S. 1961. “Asymptotic Expansions for the Mean and Variance of the Serial Correlation Coefficient.” Biometrika 48: 95–4. https://doi.org/10.2307/2333133.Search in Google Scholar

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/jem-2023-0036).

Received: 2023-08-28

Accepted: 2024-06-10

Published Online: 2024-06-25

Supplementary Material Details

Articles in the same Issue

https://doi.org/10.1515/jem-2023-0036

Keywords for this article

ARX-model; asymptotic approximation; forecasting; non-stationary regressors; structural breaks; unit root