Selecting between causal and noncausal models with quantile autoregressions

Alain Hecq; Li Sun

doi:10.1515/snde-2019-0044

Article Open Access

Selecting between causal and noncausal models with quantile autoregressions

Alain Hecq and Li Sun

Published/Copyright: September 19, 2020

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Studies in Nonlinear Dynamics & Econometrics Volume 25 Issue 5

Abstract

We propose a model selection criterion to detect purely causal from purely noncausal models in the framework of quantile autoregressions (QAR). We also present asymptotics for the i.i.d. case with regularly varying distributed innovations in QAR. This new modelling perspective is appealing for investigating the presence of bubbles in economic and financial time series, and is an alternative to approximate maximum likelihood methods. We illustrate our analysis using hyperinflation episodes of Latin American countries.

Keywords: causal and noncausal time series; financial bubbles; model selection criterion; quantile autoregressions; regularly varying variables

1 Motivation

Mixed causal and noncausal time series models have been recently used in order (i) to obtain a stationary solution to explosive autoregressive processes, (ii) to improve forecast accuracy, (iii) to model expectation mechanisms implied by economic theory, (iv) to interpret non-fundamental shocks resulting from the asymmetric information between economic agents and econometricians, (v) to generate non-linear features from simple linear models with non-Gaussian disturbances, (vi) to test for time reversibility. When the distribution of innovations is known, a non-Gaussian likelihood approach can be used to discriminate between lag and lead polynomials of the dependent variable. For instance, the R package MARX developed by (Hecq, Lieb, and Telg 2017a) estimates univariate mixed models under the assumption of a Student’s t-distribution with v degrees of freedom (see also Lanne and Saikkonen 2011, 2013) as well as the Cauchy distribution as a special case of the Student’s t when v = 1. Gouriéroux and Zakoian (2016) privilege the latter distribution to derive analytical results. Gouriéroux and Zakoian (2015); Fries and Zakoian (2017) provide an additional flexibility to involve some skewness by using the family of alpha-stable distributions. However, all those aforementioned results require the estimation of a parametric distributional form. In this article we take another route.

The objective of this paper is to select between causal and noncausal models without using parametric distributional assumptions. To achieve that, we adopt a quantile regression (QR) framework and apply quantile autoregressions (QCAR hereafter) (Koenker and Xiao 2006) on candidate models. Although we obviously also require non-Gaussian innovations in time series, we do not make any parametric distributional assumption about the innovations. By using quantile regressions, we consider a statistic called the sum of rescaled absolute residuals (SRAR hereafter) to measure model performances and reveal properties of time series. Remarkably we find that SRAR cannot always favour a model uniformly along quantiles. This issue is common for time series of asymmetric distributed innovations, which causes confusion in model selection and calls for a robust statistic to meet the goal. Considering that, we propose to aggregate the SRAR information over quantiles. It is worth mentioning that when coefficients are constant in the underlying model with a symmetrically i.i.d. error term, the aggregate SRAR criterion is equivalently to select between forward and backward conditional mean models (termed by Gourieroux and Zakoian (2017)). However, the aggregate SRAR is a measure based on the whole dynamics of the underlying process, which is not dominated by the conditional mean information any more. This characteristic of the aggregate SRAR criterion indeed makes it robust in model selection even for some general situations such as with asymmetric distributed innovations. Another remark on this paper is that our method is restricted to the model framework of purely causal or noncausal autoregressions without other explanatory variables, thereby this method can be used to questions like asset pricing of exchange rate where current exchange rate is associated with future exchange rates. However, it cannot be used to questions like the Taylor (1993) rule which associates the dynamics of the nominal interest rate with dynamics of some endogenous variables (e.g. inflation).

The rest of this paper is constructed as follows. Section 2 introduces mixed causal and noncausal models and our research background. In Section 3, we propose quantile autoregressions in the time reverse version called quantile noncausal autoregressions (QNCAR) along with a generalized asymptotic theorem in a stable law for both QCAR and QNCAR. Section 4 brings out a common issue in the model selection through SRAR comparison. The use of the aggregate SRAR over all quantiles as a new model selection criterion is then proposed with the shape of SRAR curves being analysed. Furthermore, we illustrate our analysis using hyperinflation episodes of four Latin American countries in Section 6. Section 7 concludes this paper.

2 Causal and noncausal time series models

Brockwell and Davis introduce in their texbooks (Brockwell and Davis 2016; Brockwell, Davis, and Fienberg 1991) a univariate noncausal specification as a way to rewrite an autoregressive process with explosive roots into a process in reverse time with roots outside the unit circle. This noncausal process possesses a stable forward looking solution whereas the explosive autoregressive process in direct time does not. This approach can be generalized to allow for both lead and lag polynomials. This is the so called mixed causal-noncausal univariate autoregressive process for y _t that we denote MAR(r, s)

(1) π ( L ) ϕ ( L − 1 ) y t = ε t ,

where π(L) = 1−π ₁ L−…−π _r L ^r,ϕ(L ⁻¹) = 1−ϕ ₁ L ⁻¹−…−ϕ _s L ^−s. L is the usual backshift operator that creates lags when raised to positive powers and leads when raised to negative powers, i.e. L ^j y _t = y _t−j and L ^−j y _t = y _t + j. The roots of both polynomials are assumed to lie strictly outside the unit circle, that is π(z) = 0 and ϕ(z) = 0 for |z| > 1 and therefore

(2) y t = π ( L ) − 1 ϕ ( L − 1 ) − 1 ε t = ∑ i = − ∞ ∞ a i ε t − i

has an infinite two sided moving average representation. We also have that E ( | ε t | δ ) < ∞ for δ > 0^[1] and the Laurent expansion parameters are such that ∑ i = − ∞ ∞ | a i | δ < ∞ . The representation (2) is sometimes clearer than (1) to motivate the terminology “causal/noncausal”. Indeed those terms refer to as the fact that y _t depends on a causal (resp. noncausal) component ∑ i = 0 ∞ a i ε t − i (resp. noncausal ∑ i = − ∞ − 1 a i ε t − i ) . With this in mind, it is obvious that an autoregressive process with explosive roots will be defined as noncausal.

Note that in (1), the process y _t is a purely causal MAR(r, 0), also known as the conventional causal AR(r) process, when ϕ ₁ = … = ϕ _s = 0,

(3) π ( L ) y t = ε t ,

while the process is a purely noncausal MAR(0, s)

(4) ϕ ( L − 1 ) y t = ε t ,

when π ₁ = … = π _r = 0.

A crucial point of this literature is that innovation terms ε _t must be i.i.d. non-Gaussian to ensure the identifiability of a causal from a noncausal specification (Breidt et al. 1991). The departure from Gaussianity is not as such an ineptitude as a large part of macroeconomic and financial time series display nonlinear and non-normal features.

We have already talked in Section 1 about the reasons for looking at models with a lead component. Our main motivation in this paper lies in the fact that MAR(r, s) models with non-Gaussian disturbances are able to replicate non-linear features (e.g. bubbles, asymmetric cycles) that previously were usually obtained by highly nonlinear models. As an example, we simulate in Figure 1 an MAR(1,1) of (1−0.8L)(1−0.6L ⁻¹)y _t = ε _t with ε t ∼ d t ( 3 ) for 200 observations.^[2] One can observe asymmetric cycles and multiple bubbles.^[3]

Figure 1:

Simulation of a MAR(1,1) model, T = 200.

Once a distribution or a group of distributions is chosen, the parameters in π(L)ϕ(L ⁻¹) can be estimated. Assuming for instance a non-standardized t-distribution for the innovation process, the parameters of mixed causal-noncausal autoregressive models of the form (1) can be consistently estimated by the approximate maximum likelihood (AML) method (Hecq, Lieb, and Telg 2016). Let (ε ₁,…,ε _T) be a sequence of i.i.d. zero mean t-distributed random variables, then its joint probability density function can be characterized as

f ε ( ε 1 , … , ε T | σ , ν ) = ∏ t = 1 T Γ ( ν + 1 2 ) Γ ( ν 2 ) π ν σ ( 1 + 1 ν ( ε t σ ) 2 ) − ν + 1 2 ,

where Γ(⋅) denotes the gamma function. The corresponding (approximate) log-likelihood function conditional on the observed data y = (y ₁, … , y _T) can be formulated as

(5) l y ( ϕ , φ , λ , α | y ) = ( T − p ) [ ln ( Γ ( ( ν + 1 ) / 2 ) ) − ln ( ν π ) − ln ( Γ ( ν / 2 ) ) − ln ( σ ) ] − ( ν + 1 ) / 2 ∑ t = r + 1 T − s ln ( 1 + ( ( π ( L ) ϕ ( L − 1 ) y t − α ) / σ ) 2 / ν ) ,

where p = r + s and ε _t = π(L)ϕ(L ⁻¹)y _t − α is replaced by a nonlinear function of the parameters when expanding the product of polynomials. The distributional parameters are collected in λ = [σ, ν]′, with σ representing the scale parameter and ν the degrees of freedom. α denotes an intercept that can be introduced in model (1). Thus, the AML estimator corresponds to the solution θ ˆ M L = arg max θ ∈ Θ l y ( θ | y ) , with θ =[ϕ′, φ ′, λ ′]′ and Θ is a permissible parameter space containing the true value of θ , say θ ₀, as an interior point. Since an analytical solution of the score function is not directly available, gradient based numerical procedures can be used to find θ ˆ M L . If ν > 2, and hence E ( | ε t | 2 ) < ∞ , the AML estimator is T -consistent and asymptotically normal. Lanne and Saikonen (2011) also show that a consistent estimator of the limiting covariance matrix is obtained from the standardized Hessian of the log-likelihood. For the estimation of the parameters and the standard innovations as well as for the selection of mixed causal-noncausal models we can also follow the procedure proposed by Hecq, Lieb, and Telg (2016).

However, the AML estimation is based on a parametric form of the innovation term in (1), which makes this method not flexible enough to adapt uncommon distributions as complex in reality. To be more practical and get rid of strong distribution assumptions on innovations, in next section we adopt quantile regression methods with some properties discussed there. This paper only focuses on purely causal and noncausal models.

3 QCAR & QNCAR

Koenker and Xiao (2006) have introduced a quantile autoregressive model of order p denoted as QAR(p) which is formulated as the following form:

(6) y t = θ 0 ( u t ) + θ 1 ( u t ) y t − 1 + … + θ p ( u t ) y t − p , t = p + 1 , … , T ,

where u _t is a sequence of i.i.d. standard uniform random variables. In order to emphasize the causal characteristic of this kind of autoregressive models, we refer to (6) as QCAR(p) hereafter. Provided that the right-hand side of (6) is monotone increasing in u _t, the τ−th conditional quantile function of y _t can be written as

(7) Q y t ( τ | y t − 1 , … y t − p ) = θ 0 ( τ ) + θ 1 ( τ ) y t − 1 + … + θ p ( τ ) y t − p .

If an observed time series { y t } t = 1 T can be written into a QCAR(p) process, its parameters as in (7) can be obtained from the following minimization problem.

(8) θ ˆ ( τ ) = arg min θ ∈ R p + 1 ∑ t = 1 T ρ τ ( y t − x t ′ θ ) ,

where ρ _τ(u): = u(τ−I(u < 0)) is called the check function, x _t′: = [1, y _t−1, … , y _t−p], and θ ′: = [θ ₀,θ ₁,…,θ _t−p]. We define the sum of rescaled absolute residuals (SRAR) for each pair of ( τ , θ ) as

(9) SRAR ( τ , θ ) ≔ ∑ t = 1 T ρ τ ( y t − x t ′ θ ) .

Substitute (9) into (8) and write the minimization problem (8) as

(10) θ ˆ ( τ ) = arg min θ ∈ R p + 1 SRAR ( τ , θ ) .

The estimation consistency and asymptotic normality in the minimization problem (8) have been provided by Koenker and Xiao (2006). A modified simplex algorithm proposed by Barrodale and Roberts (1973) can be used to solve the minimization, and in practise parameters for each τ−th quantile can be obtained, for instance, through the rq() function from the quantreg package in R or in EViews.

3.1 QNCAR

A QNCAR(p) specification is introduced here as the noncausal counterpart of the QCAR(p) model by reversing time, explicitly as follows:

(11) Q y t ( τ | y t + 1 , … y t + p ) = ϕ 0 ( τ ) + ϕ 1 ( τ ) y t + 1 + … + ϕ p ( τ ) y t + p .

Analogously to the QCAR(p), the estimation of the QNCAR(p) goes through solving

θ ˆ ( τ ) = arg min θ ∈ R p + 1 SRAR ( τ , θ )

with

x t ′ = [ 1 , y t + 1 , … , y t + p ] ,

where for the simplicity of notations, we use θ ˆ ( τ ) to denote the estimate in quantile noncausal autoregression. Drawing on the asymptotic derived by Koenker and Xiao (2006), we present the following theorem for QNCAR(p) based on three assumptions which are made to ensure covariance stationarity of the time series (by (A1) and (A2)) and the existence of quantile estimates (by (A3)).

Remark.

There is an issue in the estimation consistency of QCAR(p) as reported by Fan and Fan (2010). This is due to the violation on the monotonicity requirement of the right side of (6) in u _t but not exclusively the monotonicity of θ _i(u _t) in u _t . So to recover an AR(p) process of coefficients θ _i(u _t) (i = 0,…,p) monotonic in u _t , quantile autoregression is not a 100% match tool unless the monotonicity requirement is met beforehand. This issue is also illustrated in Section 4.1 .

Theorem 1.

A QNCAR(p) model can be written in the following vectorized companion form:

(12) x ˜ t = A t x ˜ t + 1 + ν t ,

where x ˜ t ’ ≔ [ y t , y t + 1 , … , y t + p − 1 ] , x t ’ ≔ [ 1 , x ˜ t ′ ] , A t ≔ [ ϕ 1 , t ϕ 2 , t … ϕ p , t I p − 1 0 ( p − 1 ) × 1 ] and ν t ≔ [ ε t 0 ( p − 1 ) × 1 ] , satisfying the following assumptions:

(A1): { ε t } t = 1 n are i.i.d. innovations with mean 0 and variance σ ² < ∞. The distribution function of ε _t, denoted as F(⋅), has a continuous density f(⋅) with f(ε) > 0 on U ≔ { ε : 0 < F ( ε ) < 1 } .
(A2): The eigenvalues of E[A t ⊗ A t] have moduli less than one.
(A3): F y t | x ˜ t + 1 ( ⋅ ) ≔ P [ y t < ⋅ | y t + 1 , y t + 2 , … , y t + p ] has derivative f y t | x ˜ t + 1 ( ⋅ ) which is uniformly integrable on U and non-zero with probability one.

Then,

(13) Σ − 1 2 T ( θ ˆ ( τ ) − ϕ ( τ ) ) ∼ d B p + 1 ( τ ) ,

where Σ: = Σ ₁ ⁻¹ Σ ₀ Σ ₁ ⁻¹, Σ 0 ≔ E [ x t x t ′ ] , Σ 1 ≔ lim T − 1 ∑ t = 1 T f y t | x ˜ t + 1 ( F y t | x ˜ t + 1 − 1 ( τ ) ) x t x t ′ , ϕ ( τ ) ′ ≔ [ F − 1 ( τ ) , ϕ 1 ( τ ) , … , ϕ p ( τ ) ] , B p + 1 ( τ ) ≔ N ( 0 , τ ( 1 − τ ) I p + 1 ) with sample size T.

The above result can be further simplified into Corollary 2 by adding the following assumption:

The coefficient matrix A _t in (12) is constant over time. (We denote A ≔ [ ϕ 1 ϕ 2 … ϕ p I p − 1 0 ( p − 1 ) × 1 ] for A _t under this assumption.)

Corollary 2.

Under assumptions (A1), (A2), (A3) and (A4),

(14) T f ( F − 1 ( τ ) ) Σ 0 1 2 ( θ ˆ ( τ ) − ϕ τ ) ∼ d B p + 1 ( τ ) ,

where ϕ _τ: = [F ⁻¹(τ),ϕ ₁, …, ϕ _p].

As can be seen, QCAR(p) and QNCAR(p) generalize the classical purely causal and purely noncausal models respectively by allowing random coefficients on lag or lead regressors over time. Corollary 2 provides additional results when the same coefficients except the intercept are used to generate each quantile. However, the moment requirement in (A1) is very strict for heavy tailed time series. In order to study noncausality by QAR in heavy tailed distributions, we have to show its applicability when weakening the assumption (A1). This goal is achieved by Theorem 3 which presents the asymptotic behaviour of the QAR estimator for a classical purely noncausal model. Similarly, the asymptotic in a classical purely causal model follows right after reversing time.

Theorem 3.

(Asymptotics in regularly varying distributed innovations).

Under Assumption (A4), a purely noncausal AR(p) of the following form

ϕ ( L − 1 ) y t = ε t ,

where ϕ(L ⁻¹) = 1−ϕ ₁ L ⁻¹−…−ϕ _p L ^−p , also satisfies the following assumptions:

{ ε t } t = 1 n are i.i.d. innovation variables with regularly vary tails defined as
(15) P ( | ε t | > x ) = x − α L ( x ) ,
where L(x) is slowly varying at ∞and 0 < α < 2. There is a sequence {a _T } satisfying
(16) T ⋅ P { ‖ ε t ‖ > a T x } → x − α f o r a l l x > 0.
with b T = E [ ε t I [ | ε t | ≤ a T ] ] = 0 . ^[4] The distribution function of ε _t , denoted as F(⋅), has continuous density f(⋅) with f(ε) > 0 on {ε:0 < F(ε) < 1} in probability one;
The roots of the polynomial ϕ(z)are greater than one, such that y _t can be written into
(17) y t = ∑ j = 0 ∞ c j ε t + j ,
where ∑ j = 0 ∞ j | c j | δ < ∞ for some δ < α, δ ≤ 1.

Then

(18) f ( F − 1 ( τ ) ) ⋅ a T T τ ( 1 − τ ) ( θ ˆ ( τ ) − ϕ τ ) ∼ d [ 1 0 0 ( ∫ 0 1 S α 2 ( s ) d s Ω ) − 1 ] [ W ( 1 ) , ∑ j = 0 ∞ c j ∫ 0 1 S α ( s ) d W ( s ) , … , ∑ j = 0 ∞ c j ∫ 0 1 S α ( s ) d W ( s ) ] ( p + 1 ) × 1 .

where ϕ τ ≔ [ F − 1 ( τ ) a T , ϕ 1 , … , ϕ p ] , Ω: = [ω _ik] being a _p×p matrix with entry ω i k ≔ ∑ j = 0 ∞ c j c j + | k − i | at the ith row and the kth column, {S_α(s)}being a process of stable distributions with index α which are independent of Brownian motions {W(s)}. In this theorem the intercept regressor in QNCAR(p) is changed to a _T such that x′t: = [a _T, y _t, y _t+1, … , y _t+p−1].

Proof. See the appendix

Heuristically, next we restrict our focus on the classical models and explore consequences of causality misspecification in quantile regressions.

3.2 Causal and noncausal models with Gaussian i.i.d. disturbances

Suppose a causal AR(1) process { y t } t = 1 T , y _t = α + βy _t−1 + ε _t, with for instance [α, β] = [1, 0.5], i.i.d. standard normal {ε _t} and T = 200. Figure 2 displays a simulated sample following this data generating process (DGP).

Figure 2:

Simulation of a one-regime process with N(0, 1) innovations, T = 200.

The information displayed in Figure 3 is the SRAR(τ) of each candidate model along quantiles, indicating their goodness of fit. The two SRAR curves almost overlap at every quantile, which implies no discrimination between QCAR and QNCAR in Gaussian innovations, in line with results in the OLS case. The Gaussian distribution is indeed time reversible, weak and strict stationary. Its first two moments characterize the whole distribution and consequently every quantile. Note that we obtain similar results for a stationary noncausal AR(p) process with i.i.d. Gaussian {ε _t}. The results are not reported to save space.

Figure 3:

SRAR plot under an AR(1) with N(0, 1) innovations, T = 200.

3.3 Causal and noncausal models with Student’s t distributed innovations

Things become different if we depart from Gaussianity. Suppose now a causal AR(1) process y _t = α + βy _t−1 + ε _t with again [α, β] = [1, 0.5] but where {ε _t} are i.i.d. Student’s t−distributed with 2 degrees of freedom (hereafter using shorthand notation: t(2)). Figure 4 depicts a simulation in this AR(1) with T = 200. Applying QCAR and QNCAR respectively on this series results in the SRAR curves displayed in Figure 5. The distance between the two curves is obvious compared to the Gaussian case, favouring the causal specification at almost all quantiles. Figure 6 is the SRAR plot of a purely noncausal process with i.i.d. Cauchy innovations. The noncausal specification is preferred in the SRAR comparison.

Figure 4:

Simulation of an AR(1) with t(2) innovations, T = 200.

Figure 5:

SRAR plot under an AR(1) with t(2), T = 200.

Figure 6:

SRAR plot under a noncausal model with Cauchy innovations, T = 200.

Figure 7:

Simulation of a noncausal model with Cauchy innovations, T = 200.

It seems now that applying the SRAR comparison at one quantile, such as the median, is sufficient for model identification, but it is not true in general. In Section 4, we will spot an identification issue in the SRAR plots, the true model even having higher SRAR values at certain quantiles than the misspecified model.

So far we have applied QCAR and its counterpart QNCAR on the classical purely causal or noncausal models with symmetrically i.i.d. innovations. Within this restricted scope, the conditional mean models of those data generating processes only differ from their conditional quantile models in the intercept term. And we see that model selection by the SRAR comparison gives uniform decisions along quantiles. However, such a model selection is not always that clear in practise. For example, in the empirical study later, we will encounter an identification issue which can be seen by checking the SRAR plots. In the next section, we will present this identification issue with some possible reasons, and propose a robust model selection criterion called the aggregate SRAR to cope with this issue.

4 SRAR as a model selection criterion

It is natural to think about SRAR as a model selection criterion since a lower SRAR means a better goodness of fit in quantile regressions. However, SRAR is a function of quantile, which raises a question on which quantile to be considered for model selection. It is empirically common to see an identification problem by checking the SRAR plots, which gives different model selections at certain quantiles and makes a selection unreliable if only one quantile is considered. In this section, we discuss this issue and propose a more robust model selection criterion based on aggregating SRARs.

4.1 Identification issue spotted from the SRAR plots

First let us see some possible model settings causing the identification problem in SRAR plots. The first case is linked to the existence of multi-regimes in coefficients.

Suppose a regime-shift model is specified as follows:

(19) y t = β t y t + 1 + ε t ,

where {ε _t} is an i.i.d. innovation process with cumulative probability function F(⋅), and β _t is defined as follows:

(20) β t = { β 1 , if 0 < F ( ε t ) ≤ τ * ; β 2 , if τ * < F ( ε t ) ≤ 1 ,

with τ * ∈ ( 0 , 1 ) and β ₁ < β ₂. In essence, the regime shift of β _t depends on the quantile occurrence of ε _t which is indexed by τ _t: = F(ε _t) with {τ _t} being i.i.d. in the standard uniform distribution.

If {y _t} can be negative, then there is a problem in using QNCAR to recover the coefficients in the underlying model (19) because the τ-th regime is not necessary to produce the τ-th conditional quantile of y _t. So the comonotonicity condition of the linear quantile regressions (19) is not satisfied. However, by restricting to the non-negative region of the covariate y _t+1 (also see Fan and Fan 2010) we force the regression model to satisfy the comonotonicity requirement without losing its association with {τ _t}. The obtained estimator is also consistent to the true coefficients in (19).^[5] QCAR (or QNCAR) with such a restriction, hereafter denoted as RQCAR (or RQNCAR) shorthand for restricted quantile causal autoregression (or restricted quantile noncausal autoregression), is formulated as follows:

(21) θ ˆ τ = arg min θ ∈ ℝ p + 1 ∑ t = 1 T I t ∈ I ρ τ y t − x ′ t θ

where I is the set restricting the quantile regression on a particular information set. The restriction is usually imposed in order for quantile regressions to meet the comonotonicity condition. To study the regime-shift model (19), we restrict the QNCAR on non-negative covariates, i.e. I = t : x t ≥ 0 .

Figure 8 shows four SRAR curves estimated from QCAR, QNCAR, RQCAR and RQNCAR. We consider a time series { y t } t = 1 600 simulated from the model (19) with τ ^* = 0.7, β ₁ = 0.2, β ₂ = 0.8 and i.i.d. innovation process in t(3), i.e. F ⁻¹(⋅) = F _t(3) ⁻¹(⋅).

Figure 8:

Identification problem spotted in the SRAR plot for restricted quantile autoregressions.

Figure 8 illustrates such an identification issue in which the SRAR curve from a true model is not always lower than one from misspecification. Applying restriction helps to enlarge the SRAR difference between a true model and a misspecified time direction.

The second case we investigate is the presence of skewed distributed innovations.

Let us consider a time series {y _t} following a purely noncausal AR(1): y _t = 0.8y _t+1 + ε _t with {ε _t} i.i.d. in a demeaned skewed t distribution with skewing parameter γ = 2 and v = 3 degrees of freedom (hereafter t(v, γ) is the shorthand notation for a skewed t-distribution). The probability density function of t(v, γ) (see Francq and Zakoïan 2007) is defined as

(22) f ( x ) = { 2 γ + 1 γ f t ( γ x ) for x < 0 2 γ + 1 γ f t ( x γ ) for x ≥ 0

where f _t(⋅) is the probability density function of the symmetric t(v) distribution. Figure 9 shows four SRAR curves obtained from the estimation of the QCAR, the QNCAR, the RQCAR and the RQNCAR respectively. The curves from the QNCAR and the RQNCAR almost overlap each other, which confirms our understanding that the monotonicity requirement is met in the true model. The estimations and the corresponding SRAR curves should be the same unless many observations are omitted by the restriction. On the other hand, the SRAR curve gets enlarged from the QCAR to the RQCAR, which is very reasonable as the feasible set in the QCAR is larger and the misspecification is not ensured to satisfy the monotonicity requirement. Again we see this identification problem from the SRAR plot. Remarkably, the SRAR curve from a true model can be higher at certain quantiles than the one from a misspecified model. Consequently the SRAR comparison relying only on particular quantiles, such as the least absolute deviation (LAD) method for the median only is not robust in general. Therefore, we propose a new model selection criterion in next subsection by including the information over all quantiles.

Figure 9:

Identification issue spotted from the SRAR plot for a skewed distribution.

4.2 The aggregate SRAR criterion

Based on the same number of explanatory variables in QCAR and QNCAR with a fixed sample size in quantile regressions, the best model is supposed to exhibit the highest goodness of fit among candidate models. Similarly to the R-squared criterion in the OLS, when turning to quantile regressions, we are led to use the SRAR criterion for model selection. The aggregate SRAR is regarded as an overall performance of a candidate model over all quantiles such as:

aggregate SRAR ≔ ∫ 0 1 S R A R ( τ ) d τ .

There are many ways to calculate this integral. One way is to approximate the integral by the trapezoidal rule. Another way is to sum up SRARs over a fine enough quantile grid with equal weights. In other words, this aggregation is regarded as an average of performances (SRAR(τ), τ∈(0, 1)) of a candidate model. In practise, there is almost no difference in model selection between the two aggregation methods.

As equal weights are used on all quantiles in the aggregate SRAR above, people may argue to use a different weighting scheme. The weighting scheme indeed can be different as when weight being one for one quantile and zero for others is to select a conditional quantile model. We agree that the weighting scheme can be customized in justice of users’ purpose. The equal-weight scheme proposed here is inspired to calculate the area under the SRAR curve over quantiles when we check the SRAR plot. Intuitively, such areas are directly linked to model performance when we concerns the whole dynamics of the underlying process. And we compare models by viewing the gap between their SRAR curves, which is the difference between the areas under their SRAR curves. This leads us to the aggregate SRAR measure.

Performances of the SRAR model selection criteria in Monte Carlo simulations are reported in Table 1. It shows the average frequencies with which we find the correct model based on the SRAR criterion per quantile and the aggregate SRAR criterion. The sample size T is 200 and each reported number is based on 2000 Monte Carlo simulations. Columns of Table 1 refer to as a particular distribution previously illustrated in this paper. As observed, the aggregate SRAR criterion performs very well even in situations with the identification issue. The Gaussian distribution being weakly and strictly stationary we cannot obviously discriminate between causal and noncausal specifications leading to an average frequency of around 50% to detect the correct model.

Table 1:

Frequencies of selecting the correct model using the SRAR criteria.

Quantiles	Gaussian	t(2)	t(1)	Two-regime	t(v = 3,γ = 2)
Quantiles	(Figure 2)	(Figure 4)	(Figure 7)	(Figure 8)	(Figure 9)
0	0.698	0.678	0.601	0.787	0.476
0.05	0.516	0.416	0.653	0.044	1.000
0.10	0.51	0.677	0.763	0.059	1.000
0.15	0.519	0.858	0.841	0.095	1.000
0.20	0.512	0.948	0.907	0.167	1.000
0.25	0.513	0.981	0.947	0.305	1.000
0.30	0.488	0.992	0.978	0.487	1.000
0.35	0.487	0.998	0.996	0.654	1.000
0.40	0.486	0.999	0.996	0.798	1.000
0.45	0.487	1.000	0.996	0.892	1.000
0.50	0.5	1.000	0.995	0.950	1.000
0.55	0.499	0.999	0.995	0.974	0.994
0.60	0.492	0.999	0.995	0.988	0.533
0.65	0.478	0.997	0.995	0.991	0.018
0.70	0.467	0.994	0.979	0.996	0.001
0.75	0.49	0.984	0.951	0.998	0.000
0.80	0.493	0.954	0.903	0.999	0.000
0.85	0.481	0.862	0.858	1.000	0.000
0.90	0.469	0.72	0.791	1.000	0.000
0.95	0.484	0.454	0.668	0.997	0.000
1	0.653	0.58	0.595	0.780	0.420
Aggregate SRAR	0.483	0.998	0.995	0.995	0.999

4.3 Shape of SRAR curves

By observing SRAR plots, we see that SRAR curves vary when the underling distribution varies. It is interesting to investigate the reasons. In this subsection, we will provide some insights on the slope and concavity of SRAR y t ( τ , θ ˆ ( τ ) ) curves under assumptions (A1), (A2), (A3) and (A4). Since ρ _τ(y _t− x _t ^’ θ ) is a continuous function in θ ∈ R ( p + 1 ) , by the continuous mapping theorem and τ , θ ˆ ( τ ) ) , we know that

ρ τ ( y t − x t ′ θ ˆ ) → p ρ τ ( y t − x t ′ ϕ τ ) .

We also know that

ρ τ ( y t − x t ′ ϕ τ ) = ρ τ ( ε t − F − 1 ( τ ) ) .

Therefore instead of directly deriving the shape of a SRAR y t ( τ , θ ˆ ( τ ) ) curve, we look at the properties of its intrinsic curve SRAR_εt(τ, F ⁻¹(τ)). We derive the first and second order derivatives of SRAR_εt(τ, F ⁻¹(τ)) with respect to τ in order to determine the shape of SRAR y t ( τ , θ ˆ ( τ ) ) .

4.3.1 The slope property

One major difference between SRAR curves in a plot is their slopes. We can compute the first-order derivative of SRAR with respect to τ if the derivative exists. Under the following assumption:

: The inverse distribution function F ⁻¹(⋅) of innovation ε _tis continuous and differentiable on (0, 1) to the second order;

we can then take the first-order derivative of SRAR_εt(τ, F ⁻¹(τ))with respect to τ.

Suppose 0 < τ < τ + Δτ < 1, Δτ > 0and denote ΔF ⁻¹(τ): = F ⁻¹(τ + Δτ)−F ⁻¹(τ).

(23) SRAR ε t ( τ + Δ τ , F − 1 ( τ + Δ τ ) ) − SRAR ε t ( τ , F − 1 ( τ ) ) = ∑ t = 1 T ( ρ τ + Δ τ ( ε t − F − 1 ( τ + Δ τ ) ) − ρ τ ( ε t − F − 1 ( τ ) ) ) = ∑ t = 1 T ( ( ε t − F − 1 ( τ + Δ τ ) ) ( τ + Δ τ − 1 { ε t − F − 1 ( τ + Δ τ ) ≤ 0 } ) . − ( ε t − F − 1 ( τ ) ) ( τ − 1 { ε t − F − 1 ( τ ) ≤ 0 } ) ) = ∑ t = 1 T ( ε t ( Δ τ − 1 { F − 1 ( τ ) < ε t ≤ F − 1 τ + Δ τ ) } ) + τ ( F − 1 ( τ ) − F − 1 ( τ + Δ τ ) ) − Δ τ F − 1 ( τ + Δ τ ) + F − 1 ( τ + Δ τ ) 1 { ε t ≤ F − 1 τ + Δ τ ) } − F − 1 ( τ ) 1 { ε t ≤ F − 1 τ ) } ) = ∑ t = 1 T ( Δ τ ( ε t − F − 1 ( τ + Δ τ ) ) + ( F − 1 ( τ + Δ τ ) − F − 1 ( τ ) ) ( 1 { ε t ≤ F − 1 ( τ + Δ τ ) } − τ ) + 1 { F − 1 ( τ ) < ε t ≤ F − 1 τ + Δ τ ) } ( F − 1 ( τ ) − ε t ) )

Divide the above difference by Δτ, and take the limit Δτ↓0. It gives us

(24) lim Δ τ ↓ 0 SRAR ε t ( τ + Δ τ , F − 1 ( τ + Δ τ ) ) − SRAR ε t ( τ , F − 1 ( τ ) ) Δ τ = ∑ t = 1 T ( ε t − F − 1 ( τ ) + d F − 1 ( τ ) d τ ( 1 { ε t ≤ F − 1 ( τ ) } − τ ) ) ,

because

(25) lim Δ τ ↓ 0 Δ τ ( ε t − F − 1 ( τ + Δ τ ) ) Δ τ = ε t − F − 1 ( τ ) , lim Δ τ ↓ 0 ( F − 1 ( τ + Δ τ ) − F − 1 ( τ ) ) ( 1 { ε t ≤ F − 1 ( τ + Δ τ ) } − τ ) Δ τ = d F − 1 ( τ ) d τ ( 1 { ε t ≤ F − 1 ( τ ) } − τ ) , lim Δ τ ↓ 0 1 { F − 1 ( τ ) < ε t ≤ F − 1 ( τ + Δ τ ) } ( F − 1 ( τ ) − ε t ) Δ τ = 0.

The last line is from

(26) { 1 { F − 1 ( τ ) < ε t ≤ F − 1 ( τ + Δ τ ) } ( F − 1 ( τ ) − ε t ) = 0 , when ε t ∉ [ F − 1 ( τ ) , F − 1 ( τ + Δ τ ) ] ; ( F − 1 ( τ ) − F − 1 ( τ + Δ τ ) ) ≤ ( F − 1 ( τ ) − ε t ) < 0 , when ε t ∈ [ F − 1 ( τ ) , F − 1 ( τ + Δ τ ) ] ;

and

(27) 0 = 1 { F − 1 ( τ ) < ε t ≤ F − 1 ( τ ) } d F − 1 ( τ ) d τ ≤ lim Δ τ ↓ 0 1 { F − 1 ( τ ) < ε t ≤ F − 1 ( τ + Δ τ ) } ( F − 1 ( τ ) − ε t ) Δ τ ≤ 0.

In analogue, the left-hand limit lim Δ τ ↑ 0 SRAR ε t ( τ + Δ τ , F − 1 ( τ + Δ τ ) ) − SRAR ε t ( τ , F − 1 ( τ ) ) Δ τ gives the same result. Therefore, we have the first-order derivative as below.

(28) d SRAR ε t ( τ , F − 1 ( τ ) ) d τ = ∑ t = 1 T ( ε t − F − 1 ( τ ) + d F − 1 ( τ ) d τ ( 1 { ε t ≤ F − 1 ( τ ) } − τ ) ) .

To manifest this result with a fixed T, we take expectation and obtain

(29) E [ d SRAR ε t ( τ , F − 1 ( τ ) ) d τ ] = T ( E [ ε t ] − F − 1 ( τ ) ) ,

when E [ ε t ] exists. In practise, we are not strict with E [ ε t ] < ∞ since the mean of an i.i.d. { ε t } t = 1 T can be estimated empirically to replace E [ ε t ] in (29) without affecting other terms.

Now we have the expectation of d SRAR ε t ( τ , F − 1 ( τ ) ) d τ which can be regarded as the underlying guideline for the slope of a SRAR curve. Before interpreting this result, let us derive the second-order derivative of SRAR_εt(τ, F ⁻¹(τ)) with respect to τ and make an interpretation together.

4.3.2 The concave property

One empirically observed property of SRAR curves is their concavity which can be explained through the second-order derivative of SRAR_εt(τ, F ⁻¹(τ)) with respect to τ under assumptions (A1), (A2), (A3), (A4) and (A7). Suppose 0<τ<τ+Δτ<1,Δτ>0.

(30) Δ 2 SRAR ε t ( τ , F − 1 ( τ ) ) ≔ SRAR ε t ( τ + Δ τ , F − 1 ( τ + Δ τ ) ) − 2 SRAR ε t ( τ , F − 1 ( τ ) ) + SRAR ε t ( τ − Δ τ , F − 1 ( τ − Δ τ ) ) = ∑ t = 1 T ( ε t − F − 1 ( τ ) ) ( 1 { F − 1 ( τ − Δ τ ) < ε t ≤ F − 1 ( τ ) } − 1 { F − 1 ( τ ) < ε t ≤ F − 1 ( τ + Δ τ ) } ) + τ ( 2 F − 1 ( τ ) − F − 1 ( τ + Δ τ ) − F − 1 ( τ − Δ τ ) ) + Δ τ ( F − 1 ( τ − Δ τ ) − F − 1 ( τ + Δ τ ) ) + ( F − 1 ( τ + Δ τ ) + F − 1 ( τ − Δ τ ) − 2 F − 1 ( τ ) ) 1 { ε t ≤ F − 1 ( τ − Δ τ ) } + ( F − 1 ( τ + Δ τ ) − F − 1 ( τ ) ) 1 { F − 1 τ < ε t ≤ F − 1 τ + Δ τ } )

Divide the above second order central difference by Δτ ², and take the limit Δτ↓0. It gives us

(31) d 2 SRAR ε t ( τ , F − 1 ( τ ) ) d τ 2 = lim Δ τ ↓ 0 Δ 2 SRAR ε t ( τ , F − 1 ( τ ) ) Δ τ 2 = ∑ T t = 1 ( d 2 F − 1 ( τ ) d τ 2 ( 1 { ε t ≤ F − 1 ( τ ) } − τ ) − 2 d F − 1 ( τ ) d τ ) ,

the last line of which is obtained similarly to (25). To interpret this result, we take expectation and get the following:

(32) E [ d 2 SRAR ε t ( τ , F − 1 ( τ ) ) d τ 2 ] = − 2 d F − 1 ( τ ) d τ T < 0.

where the inequality holds with probability one since f(ε) > 0with probability one in the assumption (A1). Now we have the expectation of d 2 SRAR ε t ( τ , F − 1 ( τ ) ) d τ 2 which can be regarded as the underlying guideline for the concavity of a SRAR curve. Together with the slope information, it implies that SRAR curves are always in arch shapes, going upward and then downward, with a peak point at E [ ε t ] = F − 1 ( τ ) . We can also know the skewness of ε _tfrom the location of the peak point: ε _tis left-skewed when the SRAR curve reaches its peak in the region τ < 0.5, or right-skewed when the peak in τ > 0.5. If ε _tis symmetrically distributed, its SRAR curve is symmetric, and vice versa.

5 Binding functions

Plotting SRAR is a way to present the goodness of fit in quantile regressions for each candidate model. Quantile regressions are the path to get residuals for SRAR calculation. As we know and provide unbiased consistent estimation for true models. To study the estimation in misspecification we adopt the concept of binding function (Dhaene, Gourieroux, and Scaillet 1998). Binding function is defined as a mapping from coefficients in the true model to pseudo-true coefficients in a misspecified model.

The estimator of a pseudo-true coefficient in quantile regression for a misspecified QCAR(p) or QNCAR(p) converges to a limiting value which is characterized into the binding function. It is difficult to derive the binding functions explicitly in a general case so that they are studied by means of simulations (see Gouriéroux and Jasiak 2017). Suppose a noncausal AR(1): y _t=π ₁ y _t+1 + ε _t, with {ε _t} i.i.d. t(ν) for v = 1, 3, 5 and 10. It is observed that the binding function in the misspecified QCAR(1) varies with two factors: (i) the distribution of ε _t and (ii) the distance function in regression which is the check function ρ _τ(⋅) in quantile regression. Figure 10, Figure 11 and Figure 12 illustrate the effect of those factors. Each point is an average value of estimates based on 1000 simulations and 600 observations. Since t(ν) is symmetric, the estimation results are in the same pattern for negative true coefficient region and (1−τ) th-quantile regression as in these three figures. Sometimes the binding function is not injective, which is evidenced in Figure 10 and Figure 11 for small absolute true coefficients. The non-injectivity of the binding function for Cauchy distributed innovations is also illustrated in Gouriéroux and Jasiak (2017) result that disables encompassing tests. On the other hand, we see that on Figure 12 the injectivity of binding functions seems recovered at τ = 10%. In the case of Cauchy distributed innovations, there are no binding functions from extreme quantile regressions like 0.1 th- or 0.9 th-quantile regression because the estimate is not convergent. Although a value for π ₁∈(0,1) is plotted in Figure 12, it is just the average of binding function estimates for π ₁for illustration.

Figure 10:

Quarterly inflation rate series plot for 4 Latin American countries.

Figure 11:

Binding function for a misspecified QCAR(1) in 0.3rd-quantile regression.

Figure 12:

Binding function for a misspecified QCAR(1) in 0.1st-quantile regression.

6 Modelling hyperinflation in Latin America

6.1 The model specification

The motivation of our empirical analysis comes from the rational expectation (RE) hyperinflation model originally proposed by Cagan (1956) and investigated by several authors (see e.g. Adam and Szafarz 1992; Broze and Szafarz 1985). We follow Broze and Szafarz (1985) notations with

(33) m t d = α p t + β E ( p t + 1 | I t ) + x t .

In (33), m _t ^d and p _t respectively denote the logarithms of money demand and price, x _t is the disturbance term summarizing the impact of exogenous factors. E(p _t+1|I _t) is the rational expectation, when it is equal to conditional expectation, of p _t+1 at time t based on the information set I _t. Assuming that the money supply m _t ^s = z _t is exogenous, the equilibrium m _t ^d = m _t ^s provides the following equation for prices

p t = − β α [ E ( p t + 1 | I t ) ] + z t − x t α , = ϕ [ E ( p t + 1 | I t ) ] + u t .

Broze and Szafarz (1985) show that a forward-looking recursive solution of this model exists when x _t is stationary and | ϕ | < 1. The deviation from that solution is called the bubble B _t with p t = ∑ i = 0 ∞ ϕ i E ( u t + i | I t ) ] + B t . Finding conditions under which this process has rational expectation equilibria (forward and or backward looking) is out of the scope of our paper. We only use this framework to illustrate the interest of economists for models with leads components. Under a perfect foresight scheme E(p _t+1|I _t) = p _t+1 we obtain the purely noncausal model

(34) p t = ϕ p t + 1 + ε ˜ t ,

with ε ˜ t = u t . In the more general setting, for instance when E(p _t+1|I _t) = p _t+1 + v _t with v _t a martingale difference, the new disturbance term is ε ˜ t = v t + u t . Empirically, a specification with one lead only might be too restrictive to capture the underlying dynamics of the observed variables. We consequently depart from the theoretical model proposed above and we consider empirical specifications with more leads or lags. Lanne and Luoto (2013, 2017) and Hecq, Lieb, and Telg (2017a, 2017b) in the context of the new hybrid Keynesian Phillips curve assume for instance that ε ˜ t is a MAR(r−1, s−1) process such as

(35) ρ ( L ) π ( L − 1 ) ε ˜ t = c + ε t ,

where ε _t is iid and c an intercept term. Inserting (35) in (34) we observe that if ε ˜ t is a purely noncausal model (i.e. a MAR(0, s−1) with ρ(L) = 1) we obtain a noncausal MAR(0, s) motion for prices

( 1 − ϕ L − 1 ) p t = π ( L − 1 ) − 1 ( c + ε t ) , ( 1 − ϕ L − 1 ) ( 1 − π 1 L − 1 − … − π s − 1 L − ( s − 1 ) ) p t = c + ε t ,

We would obtain a mixed causal and noncausal model if ρ(L) ≠ 1. Our guess that the same specification might in some circumstances empirically (although not mathematically as the lag polynomial does not annihilate the lead polynomial) gives rise to a purely causal model in small samples when the autoregressive part dominates the lead component.

The above illustration presents a context of pure causal and noncausal models so that we can apply our approach to give an empirical analysis. It would be interesting to extend our modelling to investigate theoretical models with both forward and backward behaviours such as backward- and forward-looking Taylor rule for instance. To do that however we have to introduce additional regressors and extend the approach of Hecq, Issler, and Telg (2020) to quantile regressions, which can be further investigated by future research and is out of the scope of this paper.

6.2 The data and unit root testing

We consider seasonally unadjusted quarterly Consumer Price Index (CPI) series for four Latin American countries: Brazil, Mexico, Costa Rica and Chile. Monthly raw price series are downloaded at the OECD database for the largest span available (in September 2018). Despite the fact that quarterly data are directly available at OECD, we do not consider those series as they are computed from the unweighted average over three months of the corresponding quarters. Hence, these data are constructed using a linear filter, leading to undesirable properties for the detection of mixed causal and noncausal models (see Hecq, Telg, and Lieb 2017a, b on this specific issue). As a consequence, we use quarterly data computed by point-in-time sampling from monthly variables. The first observation is 1969Q1 for Mexico, 1970Q1 for Chile, 1976Q1 for Costa Rica and 1979Q4 for Brazil. Our last observation is 2018Q2 for every series. We do not use monthly data in this paper as monthly inflation series required a very large number of lags to capture their dynamic feature. Moreover, the detection of seasonal unit roots in the level of monthly price series was quite difficult.

Applying seasonal unit root tests (here HEGY tests, see Hylleberg et al. 1990) with a constant, a linear trend and deterministic seasonal dummies, we reject (see Table 2 in which a * denotes a rejection of the null unit root hypothesis at a specific frequency corresponding to 5% significance level) the null of seasonal unit roots in each series whereas we do not reject the null of a unit root at the zero frequency. The implementation of the unit root tests here is concerned with conditional mean models of the raw data to ensure that we process the data and use its weakly stationary time series in quantile regressions for analysis. The unit root testing can also been done per quantile (Koenker and Xiao 2004) to relate short-term explosiveness of time series to unit-root quantile models, which is an interesting perspective to treat explosive time series and alternative to causal and noncausal modelling. We do not go deeper in the unit-root direction for this paper but with its outlook on future research.

Table 2:

Seasonal HEGY unit root tests in the log levels of prices.

Country	H₀:π₁=0	H₀:π₂=0	H₀:π₃=π₄=0	Sample
ln P t B r a	−1.39	−5.75*	48.28*	1979Q4−2018Q2
ln P t C h i	−2.98	−6.32*	20.13*	1970Q1−2018Q2
ln P t C o s t a	−1.80	−4.23*	7.81*	1976Q1−2018Q2
ln P t M e x	−0.88	−11.92*	60.10*	1969Q1−2018Q2

The number of lags of the dependent variable used to whiten for the presence of autocorrelation is chosen by AIC. From these results we compute quarterly inflation rates for the four countries in annualized rate, i.e. Δ ln P _t ⁱ × 400.Next we carry out a regression of Δ ln P _t ⁱ × 400 on seasonal dummies to capture the potential presence of deterministic seasonality. The null of no deterministic seasonality is not rejected for the four series. Figure 13 displays quarterly inflation rates and it illustrates the huge inflation episodes that the countries had faced. Among the four inflation rates, Brazil and Mexico show the typical pattern closer to the intuitive notion of what a speculative bubble is, namely a rapid increase of the series until a maximum value is reached before the bubble bursts.

Figure 13:

Quarterly inflation rate series plot for 4 Latin American countries.

6.3 Empirical findings and identification of noncausal models

Table 3 reports for each quarterly inflation rates the autoregressive model obtained using the Hannan–Quinn information criterion. Given our results on the binding function (see also Gouriéroux and Jasiak 2017) it is safer to determine the pseudo-true autoregressive lag length using such an OLS approach than using quantile regressions or using maximum likelihood method. Indeed there is the risk that a regression in direct time from a noncausal DGP provides an underestimation of the lag order for some distributions (e.g. the Cauchy) and some values of the parameters.

Table 3:

Descriptive statistics for quarterly inflation rates.

Country	HQ	BJ	skew.	kurt.	LM[1−2]	ARCH[1−2]
Δln P t B r a	1	<0.001	−2.54	56.96	0.19	<0.001
Δln P t C h i	7	<0.001	2.84	22.45	0.09	0.09
Δln P t C o s t a	4	<0.001	1.01	8.73	0.47	0.30
Δln P t M e x	3	<0.001	−0.40	13.81	0.20	<0.001

Estimating autoregressive univariate models gives the lag length range from p = 1 for Brazil to p = 7 for the Chilean inflation rate. The p ₋ values of the Breush-Pagan LM test (see column labelled LM[1−2]) for the null of no-autocorrelation after having included those lags show that we do not reject the null in every four cases. On the other hand, we reject the null of normality (Jarque–Bera test) in the disturbances of each series. We should consequently be able to identify causal from noncausal models. From columns skew. and kurt. it emerges that the residuals are skewed to the left for Brazil and Mexico and skewed to the right for Chile and Costa Rica. Heavy tails are present in each series. At a 5% significance level we reject the null of no ARCH (see column ARCH[1−2]) for Brazil and Mexico. Gouriéroux and Zakoian (2017) have derived the closed form conditional moments of a misspecified causal model obtained from a purely noncausal process with alpha stable disturbances. They show that the conditional mean (in direct time) is a random walk with a time varying conditional variance in the Cauchy case. This result would maybe favour the presence of a purely noncausal specification for Brazil and Mexico as the null of no ARCH is rejected. But this assertion must be carefully evaluated and tested, for instance using our comparison of quantile autoregressions in direct and reverse time. The results by the Q(N)CAR are reported in Table 4, and the RQ(N)CAR produces the same results. Each cell of Table 4 provides the selection frequency of MAR(0, p) or MAR(p, 0) identified by the SRAR at quantiles 0.1, 0.3, 0.5, 0.7, 0.9 as well as the aggregated SRAR. Figure 14 displays the SRAR curves from 0.05th-quantile to 0.95th-quantile by the Q(N)CAR for the four economies respectively, similarly to the ones by the RQ(N)CAR with restriction on non-negative regressors. As observed, the identification problem is raised in the SRAR plots. Especially in the SRAR plot for Brazil, it is hard to trust a model from evidence at single quantiles. However, the aggregate SRAR criterion comes to help in this situation from an overall perspective. We conclude that Brazil, Mexico and Costa Rica are better characterized as being purely noncausal while Chile being purely causal according to the aggregate SRAR criterion.

Table 4:

SRAR identification results.

Country	SRAR_τ=0.1	SRAR_τ=0.3	SRAR_τ=0.5	SRAR_τ=0.7	SRAR_τ=0.9	SRAR_total
Δln P t B r a	MAR(0, 1)	MAR(0, 1)	MAR(0, 1)	MAR(1, 0)	MAR(1, 0)	MAR(0, 1)
Δln P t C h i	MAR(7, 0)	MAR(7, 0)	MAR(7, 0)	MAR(0, 7)	MAR(0, 7)	MAR(7, 0)
Δln P t C o s t a	MAR(0, 4)	MAR(0, 4)	MAR(0, 4)	MAR(4, 0)	MAR(4, 0)	MAR(0, 4)
Δln P t M e x	MAR(0, 3)	MAR(0, 3)	MAR(0, 3)	MAR(3, 0)	MAR(3, 0)	MAR(0, 3)

Figure 14:

SRAR plots of the inflation rates of four Latin American countries respectively.

7 Conclusions

This paper introduces a new way to select between causal and noncausal models by comparing residuals from quantile autoregressions developed by Koenker and Xiao (2006) and from the time-reverse specifications. To adapt to heavy tailed distributions, we generalize the quantile autoregression theory for regularly varying distributions. This also confirms the validity of quantile autoregressions in analysing heavy tailed time series, such as explosive or bubble-type dynamics. It is natural to consider SRAR as a model selection criterion in the quantile regression framework. However due to the identification problem spotted in the SRAR plots as presented in this paper, we propose to use the aggregate SRAR criterion for model selection. The robustness in its performance has been seen from all the results in this paper. It is worth mentioning that when coefficients are constant in the underlying model with a symmetrically i.i.d. error term, the aggregate SRAR criterion is equivalently to select between forward and backward conditional mean models (termed by Gourieroux and Zakoian (2017)). However, the aggregate SRAR is a measure based on the whole dynamics of the underlying process, which is not dominated by the conditional mean information any more. This characteristic of the aggregate SRAR criterion indeed makes it robust in model selection even for some general situations such as with asymmetric distributed innovations. In the empirical study on the inflation rates of four Latin American countries, we found that the purely noncausal specification is favoured in three cases.

Finally some possible extensions of our approach can be to the identification of mixed models in addition to purely causal and noncausal specifications, to enhancing QCAR and QNCAR with some explanatory variables in order to investigate the Taylor (1993) rule, and to investigating the unit-root testing per quantile for QCAR as well as QNCAR. Also, a formal testing on SRAR differences would require the application of a bootstrap approach which is beyond the scope of this paper but in our outlook for the future research.

Corresponding author: Li Sun, Ph.D., Maastricht University, School of Business and Economics, Department of Quantitative Economics, P.O.Box 616, 6200 MD Maastricht, The Netherlands, Phone: +31619341416, E-mail: l.sun@maastrichtuniversity.nl

Acknowledgements

We would like to thank Eric Beutner, Sean Telg, Sebastien Fries as well as the participants in the following conferences and seminar places in which we have presented our paper: The 26th Annual Symposium of the Society for Nonlinear Dynamics and Econometrics, Tokyo, March 2018, The Eighth International Conference on Mathematical and Statistical Methods for Actuarial Sciences and Finance, Madrid, April 2018, The 12th NESG 2018 Conference, Amsterdam, May 2018, The 2nd International Conference on Econometrics and Statistics, Hong Kong, June 2018, The 4th Dongbei Econometrics Workshop, Dalian, June 2018, The 2018 IAAE Annual Conference International Association for Applied Econometrics, Montreal, June 2018, The 12th International Conference on Computational and Financial Econometrics, Pisa, December 2018, The University of Milano-Bicocca, The BI Norwegian Business School and The University of Nottingham. All errors are ours.

Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: None declared.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

Appendix

Alternative way to simulate MAR models

Suppose that the DGP is a MAR(r, s) as in (1). First, we rewrite (1) into a matrix representation as follows:

(36) M y = ε , M ≔ [ π ( L ) ϕ ( L − 1 ) 0 … 0 0 π ( L ) ϕ ( L − 1 ) … 0 … 0 0 … π ( L ) ϕ ( L − 1 ) ] , y ≔ [ y 1 y 2 … y T ] ′ , ε ≔ [ ε 1 ε 2 … ε T ] ′ ,

where M is T × T matrix and T is the sample size. The equivalence to (1) holds by assuming y _1−r, y _2−r, … , y ₀ and y _T+1, y _T+2, … , y _T+s are all zeros. This assumption effect can be neglected by deleting enough observations from the beginning and the end of a simulated sample, for instance, { y t } t = 201 T − 200 kept for analysis from a first simulated { y t } t = 1 T . Next, M can be decomposed into a product of two diagonal matrices, denoted as L and U, of main diagonal entries being π(L) and ϕ ( L − 1 ) respectively as follows.

(37) L = [ 1 0 0 0 … 0 − π 1 1 0 0 … 0 − π 2 − π 1 1 0 … 0 … … 0 … − π r … − π 1 1 ] , U = [ 1 − ψ 1 … − ψ s 0 … 0 0 1 − ψ 1 … − ψ s 0 … 0 … … … 0 … 0 … 1 ]

Substitute (37) into (36). We get

L U y = ε ,

such as

(38) y = U − 1 L − 1 ε .

Given ε , y can be obtained directly since L and U are positive definite triangular matrices. This MAR(r, s) simulating method can easily be generalized, for instance, for an MAR(r, s) involving some exogenous independent variables presented by Hecq, Issler, and Telg (2020). In practise this vector-wise simulation method is slower than the element-wise method because of the matrix creation and storage in simulation.

Proof of Theorem 3

Proof.

First, we rewrite SRAR( τ , θ ˆ ( τ ) ) as follows:

(39) SRAR ( τ , θ ˆ ( τ ) ) = ∑ t = 1 T ρ τ ( y t − x t ′ θ ˆ ( τ ) ) = ∑ t = 1 T ρ τ ( y t − x t ′ ϕ τ + x t ′ ϕ τ − x t ′ θ ˆ ( τ ) ) = ∑ t = 1 T ρ τ ( u t τ − 1 a T T ν ′ x t ) ,

where x t’:=[aT, yt+1, … , yt+p], u t τ ≔ y t − x t ’ ϕ τ = ε t − F − 1 ( τ ) , ν ≔ a T T ( θ ˆ ( τ ) − ϕ τ ) . We know from Davis and Resnick (1985) and Knight (1989, 1991) that

(40) 1 a T ( ∑ t = 1 ⌊ T ⋅ s ⌋ ( ε t − b T ) ) ∼ d S α ( s ) , 1 a T T ∑ t = 1 T ( y t − ⌊ T ⋅ s ⌋ ∑ j = 0 ∞ c j b T ) ∼ d ∑ j = 0 ∞ c j ∫ 0 1 S α ( s ) d s , 1 a T 2 T ∑ t = 1 T ( y t ⋅ y t + h − ⌊ T ⋅ s ⌋ ∑ j = 0 ∞ c j c j + h b T 2 ) ∼ d ∑ j = 0 ∞ c j c j + h ∫ 0 1 S α 2 ( s ) d s ,

where t=⌊T⋅s⌋, and { S α ( s ) } is a process of stable distributions with index α. Without loss of generality, we assume b _T = 0 for the proof below. In use of the limiting behaviour information presented in (40), we get that

(41) 1 a t 2 T ∑ T t = 1 x t x t ′ ∼ d [ 1 0 0 ∫ 0 1 S α 2 ( s ) d s Ω ] ( p + 1 ) × ( p + 1 )

where

Ω ≔ [ ω i k ] p × p , ω i k ≔ ∑ j = 0 ∞ c j c j + | k − i | ,

with ω _ikbeing the entry at Ω’s ith row and kth column. Ω is positive definite symmetric. Note that θ ˆ ( τ ) = arg min θ ∈ R p + 1 SRAR ( τ , θ ) which also minimizes

(42) Z T ( ν ) ≔ ∑ t = 1 T [ ρ τ ( u t τ − 1 a T T ν ′ x t ) − ρ τ ( u t τ ) ] .

Z _T( ν ) is a convex random function. Knight (1989) showed that if Z _T( ν ) converges in distribution to Z( ν ) and Z( ν )has unique minimum, then the convexity of Z _T( ν ) ensures ν ˆ = arg min ν ∈ R p + 1 Z T ( ν ) converging in distribution to arg min ν ∈ R p + 1 Z ( ν ) .

By using the following check function identity:

(43) ρ τ ( v 1 − v 2 ) − ρ τ ( v 1 ) = − v 2 ξ τ ( v 1 ) + ( v 1 − v 2 ) ( I ( 0 > v 1 > v 2 ) − I ( 0 < v 1 < v 2 ) ) = − v 2 ξ τ ( v 1 ) + ∫ 0 v 2 ( I ( v 1 ≤ s ) − I ( v 1 < 0 ) ) d s ,

where ξ _τ(v): = τ−I(v < 0), we can rewrite Z _T( ν ) into

(44) Z T ( ν ) = − ∑ t = 1 T 1 a T T ν ′ x t ξ τ ( u t τ ) + ∑ t = 1 T ∫ 0 1 a T T ν ′ x t ( I ( u t τ ≤ s ) − I ( u t τ < 0 ) ) d s = Z T ( 1 ) ( ν ) + Z T ( 2 ) 1 ( ν ) ,

where Z T ( 2 ) ( ν ) ≔ ∑ t = 1 T ∫ 0 1 a T T ν ′ x t ( I ( u t τ ≤ s ) − I ( u t τ < 0 ) ) d s and Z T ( 1 ) ( ν ) ≔ − ∑ t = 1 T 1 a T T ν ′ x t ξ τ ( u t τ ) . Further denote η t ( ν ) ≔ ∫ 0 1 a T T ν ’ x t ( I ( u t τ ≤ s ) − I ( u t τ < 0 ) ) d s , η ¯ t ( ν ) ≔ E [ η t ( ν ) | x t ] and Z ¯ T ( 2 ) ( ν ) ≔ ∑ t = 1 T η ¯ t ( ν ) . By Assumption (A5) and small enough 1 a T T ν ′ x t , we further rewrite Z ¯ T ( 2 ) ( ν ) as follows:

(45) Z ¯ T ( 2 ) ( ν ) = ∑ t = 1 T E [ ∫ 0 1 a T T ν ′ x t ( I ( u t τ ≤ s ) − I ( u t τ < 0 ) ) d s | x t ] = ∑ t = 1 T ∫ 0 1 a T T ν ′ x t [ ∫ F − 1 ( τ ) s + F − 1 ( τ ) f ( r ) d r ] d s = ∑ t = 1 T ∫ 0 1 a T T ν ′ x t F ( s + F − 1 ( τ ) ) − F ( F − 1 ( τ ) ) s s d s = ∑ t = 1 T ∫ 0 1 a T T ν ′ x t f ( F − 1 ( τ ) ) s d s = 1 2 a T 2 T ∑ t = 1 T f ( F − 1 ( τ ) ) ν ′ x t x t ′ ν + o p ( 1 ) = 1 2 a T 2 T f ( F − 1 ( τ ) ) ν ′ ( ∑ t = 1 T x t x t ′ ) ν + o p ( 1 )

Using the limiting behaviour information presented in (40), we get the limiting distribution for Z ¯ T ( 2 ) ( ν ) so as for Z T ( 2 ) ( ν ) as follows:

(46) Z T ( 2 ) ( ν ) ∼ d 1 2 f ( F − 1 ( τ ) ) ν ′ [ 1 0 0 ∫ 0 1 S α 2 ( s ) d s Ω ] ν ,

by the fact that Z T ( 2 ) ( ν ) − Z ¯ T ( 2 ) ( ν ) ∼ p 0 which can be proved by following the arguments of Knight (1989).

The limiting distribution of Z T ( 1 ) ( ν ) can also be deduced in using (40) as follows.

(47) − ∑ T t = 1 1 a T T ν ′ x t ξ τ ( u t τ ) ∼ d ν ′ [ σ ξ W ( 1 ) , ∑ j = 0 ∞ c j σ ξ ∫ 0 1 S α ( s ) d W ( s ) , … , ∑ j = 0 ∞ c j σ ξ ∫ 0 1 S α ( s ) d W ( s ) ] ( p + 1 ) × 1 ,

where […]_(p+1)×1 is a column vector of (p + 1) elements, ∫ dW(s) is a stochastic integral with Brownian motion {W(s)} independent of {S_α(s)}(see Knight (1991)), and σ _ξ is the standard deviation of ξ _τ(u _tτ)which equals τ ( 1 − τ ) . Therefore by Davis and Resnick (1985) and Knight (1989, 1991),

(48) Z T ( 1 ) ( ν ) ∼ d ν ′ τ ( 1 − τ ) [ W ( 1 ) , ∑ j = 0 ∞ c j ∫ 0 1 S α ( s ) d W ( s ) , … , ∑ j = 0 ∞ c j ∫ 0 1 S α ( s ) d W ( s ) ] ( p + 1 ) × 1 .

Thus,

(49) Z T ( ν ) ∼ d Z ( ν ) ≔ ν ′ τ ( 1 − τ ) [ W ( 1 ) , ∑ j = 0 ∞ c j ∫ 0 1 S α ( s ) d W ( s ) , … , ∑ j = 0 ∞ c j ∫ 0 1 S α ( s ) d W ( s ) ] ( p + 1 ) × 1 + 1 2 f ( F − 1 ( τ ) ) ν ′ [ 1 0 0 ∫ 0 1 S α 2 ( s ) d s Ω ] ν .

and so

f ( F − 1 ( τ ) ) ⋅ a T T τ ( 1 − τ ) ( θ ˆ ( τ ) − ϕ τ ) ∼ d [ 1 0 0 ( ∫ 0 1 S α 2 ( s ) d s Ω ) − 1 ] [ W ( 1 ) , ∑ j = 0 ∞ c j ∫ 0 1 S α ( s ) d W ( s ) , … , ∑ j = 0 ∞ c j ∫ 0 1 S α ( s ) d W ( s ) ] ( p + 1 ) × 1 .

follows by setting the derivative of Z( ν )to 0 and solving for ν . □

References

Adam, M. C., and A. Szafarz. 1992. “Speculative Bubbles and Financial Markets.” Oxford Economic Papers 44: 626–40, https://doi.org/10.1093/oxfordjournals.oep.a042068.Search in Google Scholar

Barrodale, I., and F. D. Roberts. 1973. “An Improved Algorithm for Discrete l1 Linear Approximation.” SIAM Journal on Numerical Analysis 10: 839–48, https://doi.org/10.1137/0710069.Search in Google Scholar

Breid, F. J., R. A. Davis, K.-S. Lh, and M. Rosenblatt. 1991. “Maximum Likelihood Estimation for Noncausal Autoregressive Processes.” Journal of Multivariate Analysis 36: 175–98, https://doi.org/10.1016/0047-259x(91)90056-8.Search in Google Scholar

Brockwell, P. J., and R. A. Davis. 2016. Introduction to Time Series and Forecasting. New York: Springer.10.1007/978-3-319-29854-2Search in Google Scholar

Brockwell, P. J., R. A. Davis, and S. E. Fienberg. 1991. Time Series: Theory and Methods. New York: Springer Science & Business Media.10.1007/978-1-4419-0320-4Search in Google Scholar

Broze, L., and A. Szafarz. 1985. “Solutions des modèles linéaires à anticipations rationnelles.” Annales de l’INSEE: 99–118, https://doi.org/10.2307/20076541.Search in Google Scholar

Cagan, P. 1956. “The Monetary Dynamics of Hyperinflation.” In Studies in the Quantity Theory if Money. Chicago: The University of Chicago Press.Search in Google Scholar

Davis, R., and S. Resnick. 1985. “Limit Theory for Moving Averages of Random Variables with Regularly Varying Tail Probabilities.” Annals of Probability: 179–195, https://doi.org/10.1214/aop/1176993074.Search in Google Scholar

Dhaene, G., C. Gourieroux, and O. Scaillet. 1998. “Instrumental Models and Indirect Encompassing.” Econometrica: 673–688, https://doi.org/10.2307/2998579.Search in Google Scholar

Fan, J., and Y. Fan. 2010. Issues on Quantile Autoregression. Also available at https://orfe.princeton.edu/~jqfan/papers/06/quantilereg.pdf.Search in Google Scholar

Francq, C., and J.-M. Zakoïan. 2007. “Hac Estimation and Strong Linearity Testing in Weak Arma Models.” Journal of Multivariate Analysis 98: 114–144, https://doi.org/10.1016/j.jmva.2006.02.003.Search in Google Scholar

Fries, S., and J.-M. Zakoian. 2017. Mixed Causal-Noncausal Ar Processes and the Modelling of Explosive Bubbles. Munich Personal RePEc Archive. Also available at https://mpra.ub.uni-muenchen.de/86926/.Search in Google Scholar

Gourieroux, C., and J. Jasiak. 2016. “Filtering, Prediction and Simulation Methods for Nnoncausal Processes.” Journal of Time Series Analysis 37: 405–430, https://doi.org/10.1111/jtsa.12165.Search in Google Scholar

Gourieroux, C., and J. Jasiak. 2015. Semi-Parametric Estimation of Noncausal Vector Autoregression. CREST. Also available at https://www.researchgate.net/profile/Joann_Jasiak/publication/308047972_REVAR/links/57d812ce08ae6399a399137e.pdf.Search in Google Scholar

Gouriéroux, C., and J.-M. Zakoïan. 2017. “Local Explosion Modelling by Non-causal Process.” Journal of the Royal Statistical Society: Series B Statistical Methodology 79: 737–56 https://doi.org/10.1111/rssb.12193.Search in Google Scholar

Hecq, A., J. V. Issler, and S. Telg. 2020. “Mixed Causal–Noncausal Autoregressions with Exogenous Regressors.” Journal of Applied Econometrics 35: 328–343, https://doi.org/10.1002/jae.2751.Search in Google Scholar

Hecq, A., L. Lieb, and S. Telg. 2016. “Identification of Mixed Causal-Noncausal Models in Finite Samples.” Annals of Economics and Statistics/Annales d’Économie et de Statistique: 307–331, https://doi.org/10.15609/annaeconstat2009.123-124.0307.Search in Google Scholar

Hecq, A., L. Lieb, and S. Telg. 2017a. Simulation, Estimation and Selection of Mixed Causal-Noncausal Autoregressive Models: The Marx Package. 3015797, SSRN. Also available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3015797.10.2139/ssrn.3015797Search in Google Scholar

Hecq, A., S. Telg, and L. Lieb. 2017b. “Do seasonal Adjustments Induce Noncausal Dynamics in Inflation Rates?.” Econometrics 5: 48, https://doi.org/10.3390/econometrics5040048.Search in Google Scholar

Hylleberg, S., R. F. Engle, C. W. Granger, and B. S. Yoo. 1990. “Seasonal Integration and Cointegration.” Journal of Econometrics 44: 215–38, https://doi.org/10.1016/0304-4076(90)90080-d.Search in Google Scholar

Knight, K. 1989. “Limit Theory for Autoregressive-Parameter Estimates in an Infinite-Variance Random Walk.” The Canadian Journal of Statistics/La Revue Canadienne de Statistique: 261–78, https://doi.org/10.2307/3315522.Search in Google Scholar

Knight, K. 1991. “Limit Theory for M-Estimates in an Integrated Infinite Variance Process.” Econometric Theory: 200–12, https://doi.org/10.1017/s0266466600004400.Search in Google Scholar

Koenker, R., and Z. Xiao. 2004. “Unit Root Quantile Autoregression Inference.” Journal of the American Statistical Association 99: 775–787, https://doi.org/10.1198/016214504000001114.Search in Google Scholar

Koenker, R., and Z. Xiao. 2006. “Quantile Autoregression.” Journal of the American Statistical Association 101: 980–90, https://doi.org/10.1198/016214506000000672.Search in Google Scholar

Lanne, M., and J. Luoto. 2013. “Autoregression-based Estimation of the New Keynesian Phillips Curve.” Journal of Economic Dynamics and Control 37: 561–70, https://doi.org/10.1016/j.jedc.2012.09.008.Search in Google Scholar

Lanne, M., and J. Luoto. 2017. “A New Time-Varying Parameter Autoregressive Model for Us Inflation Expectations.” Journal of Money, Credit, and Banking 49: 969–95, https://doi.org/10.1111/jmcb.12402.Search in Google Scholar

Lanne, M., and P. Saikkonen. 2011a. “Gmm Estimation with Non-causal Instruments.” Oxford Bulletin of Economics & Statistics 73: 581–592, https://doi.org/10.1111/j.1468-0084.2010.00631.x.Search in Google Scholar

Lanne, M., and P. Saikkonen. 2011b. “Noncausal Autoregressions for Economic Time Series.” Journal of Time Series Econometrics 3, https://doi.org/10.2202/1941-1928.1080.Search in Google Scholar

Lanne, M., and P. Saikkonen. 2013. “Noncausal Vector Autoregression.” Econometric Theory: 447–481, https://doi.org/10.1017/s0266466612000448.Search in Google Scholar

Liu, X. 2019. “Quantile-based Asymmetric Dynamics of Real Gdp Growth.” Macroeconomic Dynamics: 1–29, https://doi.org/10.1017/S1365100519000063.Search in Google Scholar

Taylor, J. B. 1993: “Discretion versus Policy Rules in Practice,” In Carnegie-Rochester Conference Series on Public Policy, 195–214. Amsterdam, The Netherlands: Elsevier.10.1016/0167-2231(93)90009-LSearch in Google Scholar

Tokdar, S. T., J. B. Kadane 2012. “Simultaneous Linear Quantile Regression: A Semiparametric Bayesian Approach.” Bayesian Analysis 7: 51–72, https://doi.org/10.1214/12-ba702.Search in Google Scholar

Received: 2019-04-25

Accepted: 2020-08-04

Published Online: 2020-09-19

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

https://doi.org/10.1515/snde-2019-0044

Keywords for this article

causal and noncausal time series; financial bubbles; model selection criterion; quantile autoregressions; regularly varying variables

Creative Commons

BY 4.0