A mixture autoregressive model based on Gaussian and Student’s t-distributions

Savi Virolainen

doi:10.1515/snde-2020-0060

Article

A mixture autoregressive model based on Gaussian and Student’s t-distributions

Savi Virolainen

Published/Copyright: July 15, 2021

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Studies in Nonlinear Dynamics & Econometrics Volume 26 Issue 4

Abstract

We introduce a new mixture autoregressive model which combines Gaussian and Student’s t mixture components. The model has very attractive properties analogous to the Gaussian and Student’s t mixture autoregressive models, but it is more flexible as it enables to model series which consist of both conditionally homoscedastic Gaussian regimes and conditionally heteroscedastic Student’s t regimes. The usefulness of our model is demonstrated in an empirical application to the monthly U.S. interest rate spread between the 3-month Treasury bill rate and the effective federal funds rate.

Keywords: interest rate spread; mixture model; nonlinear autoregression; regime switching

Corresponding author: Savi Virolainen, Faculty of Social Sciences, University of Helsinki, P. O. Box 17, Helsinki FI–00014, Finland, E-mail: savi.virolainen@helsinki.fi

Acknowledgements

The author thanks Markku Lanne, Mika Meitz, and Pentti Saikkonen who commented the work and helped to improve the paper substantially. The author also thanks an anonymous referee for useful comments and the Academy of Finland for financing the project.

Author contribution: The author has accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This work was supported by the Academy of Finland under Grant 308628.
Conflict of interest statement: The author has no conflict of interest to declare.

Appendix A: Modified genetic algorithm

As discussed in Section 3.1, the accompanied R package uGMAR (Virolainen 2021) employs a two-phase producedure for estimating the parameters of the G-StMAR model (and also of the GMAR (Kalliovirta, Meitz, and Saikkonen 2015) and the StMAR (Meitz, Preve, and Saikkonen 2021) model). In the first phase, a genetic algorithm is used to find starting values for a gradient based variable metric algorithm (Nash 1990, algorithm 21) which then, in the second phase, accurately converges to a nearby local maximum or saddle point. In this appendix, it is first briefly described how our version of the genetic algorithm functions in general, and then the specific modifications made to enhance estimation of the G-StMAR model are discussed (for more detailed description of the genetic algorithm, see, e.g., Dorsey and Mayer 1995).

In a genetic algorithm, an initial population that consists of different parameter vectors (that are often drawn at random) is first constructed. Then the genetic algorithm operates iteratively so that in each iteration, referred to as generation, the current population consisting of candidate solutions goes through the phases of selection, crossover, and mutation. In the selection phase, parameter vectors are sampled with replacement from the current population to the reproduction pool according to probabilities that are based on their fitness, that is, on the related log-likelihoods. In the crossover phase, some of the parameter vectors in the reproduction pool are crossed over with each other, with the probabilities of experiencing crossover given by the crossover rate. Finally, some of the parameter vectors are mutated in the mutation phase, with the mutation probabilities given by the mutation rate. In our version of the genetic algorithm, mutation means that the mutating parameter vector is fully replaced with another parameter vector that is drawn at random (in Dorsey and Mayer 1995, mutations are drawn for each scalar component of parameter vectors individually). The reproduction pool that has experienced crossovers and mutations is the new population, and the algorithm proceeds to the next generation, evolving towards the global maximum one generation after another.

Because the G-StMAR model can be challenging to estimate even with a robust estimation algorithm such as the genetic algorithm, we have made modifications to improve its performance. In particular, a slightly modified version^[10] of the individually adaptive crossover rate and mutation rate introduced by Patnaik and Srinivas (1994) is employed in order to force the subaverage solutions to disrupt while protecting the better ones. The fitness inheritance proposed by Smith, Dike, and Stegmann (1995) is deployed to shorten the estimation time by cutting down the number computationally costly evaluations of the log-likelihood function. In order to enhance thorough exploration of the parameter space, the algorithm proposed by Monahan (1984) is used in some random mutations to generate parameter vectors near the boundary of the stationarity region. In the case of a premature convergence, most of the population is mutated so that exploration of the parameter space continues. Moreover, after a large number generations have been run, for faster convergence the random mutations will be targeted to a neighbourhood of the best-so-far parameter vector; we call these smart mutations.

In addition to the modifications described above, we have made further adjustments to care for the special structure of the log-likelihood function. Specifically, the definition of the mixing weights (2.15) implies that if a regime has parameter values that fit poorly relative to the other regimes, the mixing weights drop to near zero. The surface of the log-likelihood function thus flattens in the related directions, meaning that the algorithm is unable to converge properly if the proposed parameter vectors don’t pose a reasonable fit for all regimes. This problem of unidentified (or redundant) regimes often occurs when the number of mixture components is chosen too large, but it can be present even when the number of mixture components is chosen correctly. In uGMAR, we try to resolve this problem by penalizing parameter vectors containing redundant regimes with smaller probabilities to get chosen to the reproduction pool. Moreover, smart mutations are targeted only to the neighbourhood of parameter values that identify all regimes. If such parameter vectors have not been found (after a large number of generations have been run), combining regimes from different parameter vectors is attempted along with random search.

Appendix B: Properties of multivariate Gaussian and Student’s t-distribution

Denote a d-dimensional real valued vector by y . It is well known that the density function of the d-dimensional multivariate Gaussian distribution with mean μ and covariance matrix Γ is

(B.1) n d ( y ; μ , Γ ) = ( 2 π ) − d / 2 det ( Γ ) − 1 / 2 exp − 1 2 ( y − μ ) ′ Γ − 1 ( y − μ ) .

Similarly to Meitz, Preve, and Saikkonen (2021) but differing from the standard form, we parametrize the Student’s t-distribution using its covariance matrix as a parameter together with the mean and degrees of freedom. The density function of such a d-dimensional t-distribution with mean μ , covariance matrix Γ, and ν > 2 degrees of freedom is

(B.2) t d y ; μ , Γ , ν = C d ( ν ) det ( Γ ) − 1 / 2 1 + ( y − μ ) ′ Γ − 1 ( y − μ ) ν − 2 − ( d + ν ) / 2 ,

where

(B.3) C d ( ν ) = Γ d + ν 2 π d ( ν − 2 ) d Γ ν 2 ,

and Γ ⋅ is the gamma function. We assume that the covariance matrix Γ is positive definite for both distributions.

Consider a partition X = ( X ₁, X ₂) of either a normally or t-distributed (with ν degrees of freedom) random vector X such that X ₁ has dimension (d₁ × 1) and X ₂ has dimension (d₂ × 1). Consider also a corresponding partition of the mean vector μ = ( μ ₁, μ ₂) and the covariance matrix

(B.4) Γ = Γ 11 Γ 12 Γ 12 ′ Γ 22 ,

where, for example, the dimension of Γ₁₁ is (d₁ × d₁). Then in the case of normally distributed X , X ₁ has the marginal distribution n d 1 ( μ 1 , Γ 11 ) and X ₂ has the marginal distribution n d 2 ( μ 2 , Γ 22 ) . In the t-distributed case, the marginal distributions are t d 1 ( μ 1 , Γ 11 , ν ) and t d 2 ( μ 2 , Γ 22 , ν ) respectively (see, e.g., Ding 2016, also in what follows).

In the normally distributed case, the conditional distribution of the random vector X ₁ given X ₂ = x ₂ is

(B.5) X 1 ∣ ( X 2 = x 2 ) ∼ n d 1 ( μ 1 ∣ 2 ( x 2 ) , Γ 1 ∣ 2 ( x 2 ) )

where

(B.6) μ 1 ∣ 2 ( x 2 ) = μ 1 + Γ 12 Γ 22 − 1 ( x 2 − μ 2 ) and

(B.7) Γ 1 ∣ 2 ( x 2 ) = Γ 11 − Γ 12 Γ 22 − 1 Γ 12 ′ .

In the t-distributed case, the analogous conditional distribution is

(B.8) X 1 ∣ ( X 2 = x 2 ) ∼ t d 1 ( μ 1 ∣ 2 ( x 2 ) , Γ 1 ∣ 2 ( x 2 ) , ν + d 2 ) ,

where

μ 1 ∣ 2 ( x 2 ) = μ 1 + Γ 12 Γ 22 − 1 ( x 2 − μ 2 ) and Γ 1 ∣ 2 ( x 2 ) = ν − 2 + ( x 2 − μ 2 ) ′ Γ 22 − 1 ( x 2 − μ 2 ) ν − 2 + d 2 ( Γ 11 − Γ 12 Γ 22 − 1 Γ 12 ′ ) .

In particular, we have

(B.9) n d ( x ; μ , Γ ) = n d 1 ( x 1 ; μ 1 | 2 ( x 2 ) , Γ 1 | 2 ( x 2 ) ) n d 2 ( x 2 ; μ 2 , Γ 22 ) and

(B.10) t d ( x ; μ , Γ , ν ) = t d 1 ( x 1 ; μ 1 | 2 ( x 2 ) , Γ 1 | 2 ( x 2 ) , ν + d 2 ) t d 2 ( x 2 ; μ 2 , Γ 22 , ν ) .

Appendix C: Proofs

C.1 Proof of Theorem 1

Suppose { y t } t = 1 ∞ is a G-StMAR process. Then the process y _t = (y_t, …, y_t−p+1) is clearly a Markov chain on R p . Let y ₀ = (y₀, …, y_−p+1) be a random vector whose distribution is characterized by the density function f ( y 0 ; θ ) = ∑ m = 1 M 1 α m n p ( y 0 ; μ m 1 p , Γ m , p ) + ∑ m = M 1 + 1 M α m × t p ( y 0 ; μ m 1 p , Γ m , p , ν m ) . According to Eqs. (2.3)–(2.5), (2.8)–(2.10), (2.13), and (2.15), the density of the conditional distribution of y₁ given y ₀ is

(C.1) f ( y 1 ∣ y 0 ; θ ) = ∑ m = 1 M 1 α m n p ( y 0 ; μ m 1 p , Γ m , p ) f ( y 0 ; θ ) n 1 ( y 1 ; μ m , 1 , σ m 2 ) + ∑ m = M 1 + 1 M α m t p ( y 0 ; μ m 1 p , Γ m , p , ν m ) f ( y 0 ; θ ) t 1 ( y 1 ; μ m , 1 , σ m , 1 2 , ν m + p )

(C.2) = ∑ m = 1 M 1 α m f ( y 0 ; θ ) n p + 1 ( ( y 1 , y 0 ) ; μ m 1 p + 1 , Γ m , p + 1 ) + ∑ m = M 1 + 1 M α m f ( y 0 ; θ ) t p + 1 ( ( y 1 , y 0 ) ; μ m 1 p + 1 , Γ m , p + 1 , ν m ) .

The random vector ( y ₁, y ₀) therefore has the density function

(C.3) f ( ( y 1 , y 0 ) ; θ ) = ∑ m = 1 M 1 α m n p + 1 ( ( y 1 , y 0 ) ; μ m 1 p + 1 , Γ m , p + 1 ) + ∑ m = M 1 + 1 M α m t p + 1 ( ( y 1 , y 0 ) ; μ m 1 p + 1 , Γ m , p + 1 , ν m ) .

Using properties of marginal densities of multivariate normal and t-distributions, by integrating y_−p+1 out we obtain the density of y ₁ as f ( y 1 ; θ ) = ∑ m = 1 M 1 α m n p ( y 1 ; μ m 1 p , Γ m , p ) + ∑ m = M 1 + 1 M α m × t p ( y 1 ; μ m 1 p , Γ m , p , ν m ) .^[11] Thus, the random vectors y ₀ and y ₁ are identically distributed. As the process { y t } t = 1 ∞ is a (time homogeneous) Markov chain, it follows that { y t } t = 1 ∞ has a stationary distribution π_y(⋅) characterized by the density f ( ⋅ ; θ ) = ∑ m = 1 M 1 α m n p ( ⋅ ; μ m 1 p , Γ m , p ) + ∑ m = M 1 + 1 M α m t p ( ⋅ ; μ m 1 p , Γ m , p , ν m ) (Meyn and Tweedie 2009, pp. 230–231).

For ergodicity, let P y p ( y , ⋅ ) = P ( y p ∈ ⋅ | y 0 = y ) signify the p-step transition probability measure of the process y _t. Using the pth order Markov property of y_t, it is easy to check that P y p ( y , ⋅ ) has the density

(C.4) f ( y p | y 0 ; θ ) = ∏ t = 1 p ∑ m = 1 M 1 α m , t n 1 ( y t ; μ m , t , σ m 2 ) + ∑ m = M 1 + 1 M α m , t t 1 ( y t ; μ m , t , σ m , t 2 , ν m + p ) .

Clearly f( y _p| y ₀; θ ) > 0 for all y p ∈ R p and all y 0 ∈ R p , so we can conclude that y _t is ergodic in the sense of Meyn and Tweedie (2009, Ch. 13) by using arguments identical to those used in the proof of Theorem 1 in Kalliovirta, Meitz, and Saikkonen (2015). □

C.2 Proof of Theorem 2

First note that L T ( c ) ( θ ) is continuous, and that together with Assumption 1 of the main paper it implies existence of a measurable maximizer θ ̂ T . In order to conclude strong consistency of θ ̂ T , it needs to be shown that (see, e.g., Newey and McFadden 1994, Theorem 2.1 and the discussion on page 2122)

the uniform strong law of large numbers holds for the log-likelihood function; that is, sup θ ∈ Θ L T ( c ) ( θ ) − E L T ( c ) ( θ ) → 0 almost surely as T → ∞,
and that the limit of L T ( c ) ( θ ) is uniquely maximized at θ = θ ₀.

Proof of (i). Because the initial values are assumed to be from the stationary distribution, the process y _t = (y_t, …, y_t−p+1), and hence also y_t, is stationary and ergodic, and E L T ( c ) ( θ ) = E l t ( θ ) . To conclude (i), it thus suffices to show that E sup θ ∈ Θ l t ( θ ) < ∞ (see Rao 1962). This is done by using compactness of the parameter space to derive finite lower and upper bounds for l_t( θ ) which is given by

(C.5) l t ( θ ) = log ∑ m = 1 M 1 α m , t n 1 ( y t ; μ m , t , σ m 2 ) + ∑ m = M 1 + 1 M α m , t t 1 y t ; μ m , t , σ m , t 2 , ν m + p .

We know from the structure of the parameter space that c 1 ≤ σ m 2 ≤ c 2 and c₁ ≤ α_m ≤ 1 − c₁ for all m = 1, …, M, and c₃ ≤ ν_m ≤ c₂ for all m = M₁ + 1, …, M, for some 0 < c₁ < 1, c₂ < ∞ and c₃ > 2. Because the exponential function is bounded from above by one on the non-positive real axis, and in addition c 1 ≤ σ m 2 , there exists a constant U₁ < ∞ such that

(C.6) n 1 ( y t ; μ m , t , σ m 2 ) = 2 π σ m 2 − 1 / 2 exp − ( y t − μ m , t ) 2 2 σ m 2 ≤ U 1

for all m = 1, …, M₁.

We also have c₃ ≤ ν_m + p ≤ c₂ + p for all m = M₁ + 1, …, M. Combined with the fact that the Gamma function is continuous on the positive real axis, this implies that there exist constants c₄ > 0 and c₅ < ∞ such that

(C.7) c 4 ≤ C 1 ( ν m + p ) = Γ 1 + ν m + p 2 π ( ν m + p − 2 ) Γ ν m + p 2 ≤ c 5

for all m = M₁ + 1, …, M. Because Γ_m and hence Γ m − 1 is positive definite, σ m 2 ≥ c 1 and c₃ ≤ ν_m ≤ c₂, we can find some c₆ > 0 such that

(C.8) σ m , t 2 = ν m − 2 + ( y t − 1 − μ m 1 p ) ′ Γ m − 1 ( y t − 1 − μ m 1 p ) ν m − 2 + p σ m 2 ≥ c 6

for all m = M₁ + 1, …, M. Combined with (C.7) and (C.8), the inequality −(1 + ν_m + p)/2 < 0 implies that there exists a constant U₂ < ∞ for which

(C.9) t 1 y t ; μ m , t , σ m , t 2 , ν m + p = C 1 ( ν m + p ) σ m , t 1 + ( y t − μ m , t ) 2 ( ν m + p − 2 ) σ m , t 2 − ( 1 + ν m + p ) / 2 ≤ U 2 .

for all m = M₁ + 1, …, M. According to (C.6), (C.9) and the restriction 0 ≤ α_m,t ≤ 1, there exists a constant U₃ < ∞ such that

(C.10) l t ( θ ) = log ∑ m = 1 M 1 α m , t n 1 ( y t ; μ m , t , σ m 2 ) + ∑ m = M 1 + 1 M α m , t t 1 y t ; μ m , t , σ m , t 2 , ν m + p ≤ U 3 .

We know from compactness of the parameter space that

(C.11) ( y t − μ m , t ) 2 2 σ m 2 ≤ c 7 ( 1 + y t 2 + y t − 1 ′ y t − 1 ) ,

implying

(C.12) exp − ( y t − μ m , t ) 2 2 σ m 2 ≥ exp − c 7 ( 1 + y t 2 + y t − 1 ′ y t − 1 ) ,

for all m = 1, …, M₁, and for some finite constant c₇. By σ m 2 ≤ c 2 it also holds that ( 2 π σ m 2 ) − 1 / 2 ≥ ( 2 π c 2 ) − 1 / 2 , so

(C.13) n 1 ( y t ; μ m , t , σ m 2 ) ≥ ( 2 π c 2 ) − 1 / 2 exp − c 7 ( 1 + y t 2 + y t − 1 ′ y t − 1 )

for all m = 1, …, M₁.

Accordingly, since σ m , t 2 ≥ c 6 and ν_m ≥ c₃, it holds for some c₈ < ∞ that

(C.14) 1 + ( y t − μ m , t ) 2 ( ν m + p − 2 ) σ m , t 2 ≤ c 8 ( 1 + y t 2 + y t − 1 ′ y t − 1 ) , m = M 1 + 1 , … , M .

Thus, because ν_m ≤ c₂ and the inner functions below take values larger than one, we have

(C.15) 1 + ( y t − μ m , t ) 2 ( ν m + p − 2 ) σ m , t 2 − ( 1 + ν m + p ) / 2 ≥ c 8 ( 1 + y t 2 + y t − 1 ′ y t − 1 ) − ( 1 + c 2 + p ) / 2 .

As Meitz, Preve, and Saikkonen (2021) state in the proof of Theorem 3, the quadratic form on the right-hand-side of (C.8) satisfies

(C.16) ( y t − 1 − μ m 1 p ) ′ Γ m − 1 ( y t − 1 − μ m 1 p ) ≤ c 9 ( 1 + y t − 1 ′ y t − 1 )

for all m = M₁ + 1, …, M, and for some c₉ < ∞. Since also 0 < ν_m − 2 ≤ c₂ and σ m 2 ≤ c 2 , we have σ m , t 2 ≤ c 10 ( 1 + y t − 1 ′ y t − 1 ) for some finite constant c₁₀. Combining the former inequality with (C.7) and (C.15) yields a lower bound

(C.17) t 1 y t ; μ m , t , σ m , t 2 , ν m + p ≥ c 4 ( c 10 ( 1 + y t − 1 ′ y t − 1 ) ) 1 / 2 c 8 ( 1 + y t 2 + y t − 1 ′ y t − 1 ) − ( 1 + c 2 + p ) / 2 .

Finally, the restriction ∑ m = 1 M α m , t = 1 together with (C.13) and (C.17) implies

(C.18) l t ( θ ) ≥ min − 1 2 log ( 2 π ) − 1 2 log ( c 2 ) − c 7 ( 1 + y t 2 + y t − 1 ′ y t − 1 ) , log ( c 4 ) − 1 2 log ( c 10 ( 1 + y t 2 + y t − 1 ′ y t − 1 ) ) − 1 + c 2 + p 2 log c 8 ( 1 + y t 2 + y t − 1 ′ y t − 1 ) .

As E y t 2 + y t − 1 ′ y t − 1 < ∞ (because y_t is stationary and has finite second moments), it follows from Jensen’s inequality that

(C.19) E log c 8 ( 1 + y t 2 + y t − 1 ′ y t − 1 ) < ∞ and E log c 10 ( 1 + y t − 1 ′ y t − 1 ) < ∞ .

The upper bound (C.10) together with (C.18) and finiteness of the aforementioned expectations shows that E sup ( θ , ν ) ∈ Θ l t ( θ ) < ∞ . □

Proof of (ii). Given that condition (3.3) of the main paper sets a unique order for the mixture components, proving that this identification condition is satisfied amounts to showing that E l t ( θ ) ≤ E l t ( θ 0 ) , and that the equality E l t ( θ ) = E l t ( θ 0 ) implies

(C.20) ϑ m = ϑ τ 1 ( m ) , 0 and α m = α τ 1 ( m ) , 0 when m = 1 , … , M 1 , and ( ϑ m , ν m ) = ( ϑ τ 2 ( m ) , 0 , ν τ 2 ( m ) , 0 ) and α m = α τ 2 ( m ) , 0 when m = M 1 + 1 , … , M ,

for some permutations {τ₁(1), …, τ₁(M₁)} and {τ₂(M₁ + 1), …, τ₂(M)}. For notational clarity, we omit the subscripts from y_t and y _t−1, and write μ_m,t = μ( y ; ϑ _m), σ m 2 = σ m 2 ( ϑ m ) , σ m , t 2 = σ m , t 2 ( y ; ϑ m , ν m ) for the expressions in (C.5) making clear their dependence on the parameter value. We leave the dependence of α_m,t on θ and y unmarked and denote by α_m,0,t mixing weights based on the true parameter value.

Making use of the fact that the density function of (y_t, y _t−1) has the form f ( ( y t , y t − 1 ) ; θ ) = ∑ m = 1 M 1 α m n p + 1 ( ( y t , y t − 1 ) ) ; μ m 1 p + 1 , Γ m , p + 1 + ∑ m = M 1 + 1 M α m t p + 1 ( ( y t , y t − 1 ) ) ; μ m 1 p + 1 , Γ m , p + 1 , ν m (see proof of Theorem 1) and reasoning based on Kullback–Leibler divergence, one can use arguments analogous to those in Kalliovirta, Meitz, and Saikkonen (2015, p. 265) to conclude E l t ( θ ) − E l t ( θ 0 ) ≤ 0 with equality if and only if for almost all ( y , y ) ∈ R p + 1

(C.21) ∑ m = 1 M 1 α m , t n 1 ( y ; μ ( y ; ϑ m ) , σ m 2 ( ϑ m ) ) + ∑ m = M 1 + 1 M α m , t t 1 ( y ; μ ( y ; ϑ m ) , σ m , t 2 ( y ; ϑ m , ν m ) ) , ν m + p = ∑ m = 1 M 1 α m , 0 , t n 1 ( y ; μ ( y ; ϑ m , 0 ) , σ m 2 ( ϑ m , 0 ) ) + ∑ m = M 1 + 1 M α m , 0 , t t 1 y ; μ y ; ϑ m , 0 , σ m , t 2 y ; ϑ m , 0 , ν m , 0 , ν m , 0 + p .

For each fixed y at a time, the mixing weights, conditional means and variances in (C.21) are constants, so we may apply the result on identification of finite mixtures of normal and t-distributions in Holzmann, Munk, and Gneiting (2006, Example 1) (their parametrization of the t-distribution slightly differs from ours, but identification with their parametrization implies identification with our parametrization). For each fixed y , there thus exists a permutation {τ₁(1), …, τ₁(M₁)} (that may depend on y ) of the index set {1, …, M₁} such that

(C.22) α m , t = α τ 1 ( m ) , 0 , t , μ ( y ; ϑ m ) = μ ( y ; ϑ τ 1 ( m ) , 0 ) and σ m 2 ( ϑ m ) = σ m 2 ( ϑ τ 1 ( m ) , 0 )

for almost all y ∈ R (m = 1, …, M₁). Analogously, for each fixed y there exists a permutation {τ₂(M₁ + 1), …, τ₂(M)} (that may depend on y ) of the index set {M₁ + 1, …, M} such that

(C.23) ν m = ν τ 2 ( m ) , 0 , α m , t = α τ 2 ( m ) , 0 , t , μ ( y ; ϑ m ) = μ ( y ; ϑ τ 2 ( m ) , 0 ) and σ m , t 2 ( y ; ϑ m , ν m ) = σ m , t 2 ( y ; ϑ τ 2 ( m ) , 0 , ν τ 2 ( m ) , 0 ) ,

for almost all y ∈ R (m = M₁ + 1, …, M).

As argued by Kalliovirta, Meitz, and Saikkonen (2015, pp. 265–266) for the GMAR type components, it follows from (C.22) that ϑ m = ϑ τ 1 ( m ) , 0 and α m = α τ 1 ( m ) , 0 for m = 1, …, M₁. Accordingly, Meitz et al. (2021) showed that (C.23) implies ϑ m = ϑ τ 2 ( m ) , 0 , ν m = ν τ 2 ( m ) , 0 and α m = α τ 2 ( m ) , 0 for m = M₁ + 1, …, M, completing the proof of strong consistency.

Given consistency and assumptions of the theorem, asymptotic normality of the ML estimator can now be concluded using standard arguments. The required steps can be found, for example, in Kalliovirta, Meitz, and Saikkonen (2016, proof of Theorem 3). We omit the details for brevity.□

References

Baklanova, V., A. Copeland, and R. McCaughrin. 2015. Reference Guide to U.S. Repo and Securities Lending Markets, Staff Report No. 740. Technical report. New york: Federal Reserve Bank of New York.10.2139/ssrn.2664207Search in Google Scholar

Ding, P. 2016. “On the Conditional Distribution of the Multivariate t Distribution.” The American Statistician 70 (3): 293–5. https://doi.org/10.1080/00031305.2016.1164756.Search in Google Scholar

Dorsey, R., and W. Mayer. 1995. “Genetic Algorithms for Estimation Problems with Multiple Optima, Nondifferentiability, and Other Irregular Features.” Journal of Business and Economic Statistics 13 (1): 53–66. https://doi.org/10.1080/07350015.1995.10524579.Search in Google Scholar

Glasbey, C. 2001. “Non-linear Autoregressive Time Series with Multivariate Gaussian Mixtures as Marginal Distributions.” Journal of Royal Statistical Society: Series C 50 (2): 143–54. https://doi.org/10.1111/1467-9876.00225.Search in Google Scholar

Goffe, W., G. Ferrier, and J. Rogers. 1994. “Global Optimization of Statistical Functions with Simulated Annealing.” Journal of Econometrics 60 (1-2): 65–99. https://doi.org/10.1016/0304-4076(94)90038-8.Search in Google Scholar

Heracleous, M., and A. Spanos. 2006. “Student’s t Dynamic Linear Regression: Re-examining Volatility Modeling.” In Econometric Analysis of Financial Time Series (Advances in Econometrics), vol. 20, chapter 1, edited by D. Terrel, and T. B. Fomby, 289–319. Bingley: Emerald Group Publishing Limited.10.1016/S0731-9053(05)20011-7Search in Google Scholar

Holzmann, H., A. Munk, and T. Gneiting. 2006. “Identifiability of Finite Mixtures of Elliptical Distributions.” Scandinavian Journal of Statistics 33 (4): 753–63. https://doi.org/10.1111/j.1467-9469.2006.00505.x.Search in Google Scholar

Kalliovirta, L. 2012. “Misspecification Tests Based on Quantile Residuals.” The Econometrics Journal 15 (2): 358–93. https://doi.org/10.1111/j.1368-423x.2011.00364.x.Search in Google Scholar

Kalliovirta, L., M. Meitz, and P. Saikkonen. 2015. “A Gaussian Mixture Autoregressive Model for Univariate Time Series.” Journal of Time Series Analysis 36 (2): 247–66. https://doi.org/10.1111/jtsa.12108.Search in Google Scholar

Kalliovirta, L., M. Meitz, and P. Saikkonen. 2016. “Gaussian Mixture Vector Autoregression.” Journal of Econometrics 192 (2): 465–98. https://doi.org/10.1016/j.jeconom.2016.02.012.Search in Google Scholar

Kishor, N., and H. Marfatia. 2013. “Does Federal Funds Futures Rate Contain Information about the Treasury Bill Rate?” Applied Financial Economics 23 (16): 1311–24. https://doi.org/10.1080/09603107.2013.808397.Search in Google Scholar

Lanne, M., and P. Saikkonen. 2003. “Modeling the U.S. Short-Term Interest Rate by Mixture Autoregressive Processes.” Journal of Financial Econometrics 1 (1): 96–125. https://doi.org/10.1093/jjfinec/nbg004.Search in Google Scholar

Le, N., R. Martin, and A. Raftery. 1996. “Modeling Flat Stretches, Bursts, and Outliers in Time Series Using Mixture Transition Distribution Models.” Journal of the American Statistical Association 91 (436): 1504–15. https://doi.org/10.1080/01621459.1996.10476718.Search in Google Scholar

Lütkepohl, H. 2005. New Introduction to Multiple Time Series Analysis, 1st ed. Berlin: Springer.10.1007/978-3-540-27752-1Search in Google Scholar

Meitz, M., and P. Saikkonen. 2021. “Testing for Observation-dependent Regime Switching in Mixture Autoregressive Models.” Journal of Econometrics 222 (1): 601–24. https://doi.org/10.1016/j.jeconom.2020.04.048.Search in Google Scholar

Meitz, M., D. Preve, and P. Saikkonen. 2018. StMAR Toolbox: A MATLAB Toolbox for Student’s t Mixture Autoregressive Models.10.2139/ssrn.3237368Search in Google Scholar

Meitz, M., D. Preve, and P. Saikkonen. 2021. “A Mixture Autoregressive Model Based on Student’s t-Distribution.” Communications in Statistics - Theory and Methods. https://doi.org/10.1080/03610926.2021.1916531.Search in Google Scholar

Meyn, S., and R. Tweedie. 2009. Markov Chains and Stochastic Stability, 2nd ed. Cambridge: Cambridge University Press.10.1017/CBO9780511626630Search in Google Scholar

Monahan, J. 1984. “A Note on Enforcing Stationarity in Autoregressive-Moving Average Models.” Biometrika 71 (2): 403–4. https://doi.org/10.1093/biomet/71.2.403.Search in Google Scholar

Nash, J. 1990. Compact Numerical Methods for Computers. Linear Algebra and Function Minimization, 2nd ed. Bristol: Adam Hilger.Search in Google Scholar

Newey, W., and D. McFadden. 1994. “Large Sample Estimation and Hyphothesis Testing.” In Handbook of Econometrics, vol. 4, chapter 36, edited by R. Eagle, and D. MacFadden. Amsterdam: Elsevier Science B.V.10.1016/S1573-4412(05)80005-4Search in Google Scholar

Patnaik, L., and M. Srinivas. 1994. “Adaptive Probabilities of Crossover and Mutation in Genetic Algorithms.” Transactions on Systems, Man and Cybernetics 24 (4): 656–67.10.1109/21.286385Search in Google Scholar

R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.Search in Google Scholar

Rao, R. 1962. “Relations between Weak and Uniform Convergence of Measures with Applications.” The Annals of Mathematical Statistics 33 (2): 659–80. https://doi.org/10.1214/aoms/1177704588.Search in Google Scholar

Redner, R., and H. Walker. 1984. “Mixture Densities, Maximum Likelihood and the Em Algorithm.” Society for Industrial and Applied Mathematics 26 (2): 195–239. https://doi.org/10.1137/1026034.Search in Google Scholar

Sarno, L., and D. Thornton. 2003. “The Dynamic Relationship between the Federal Funds Rate and the Treasury Bill Rate: An Empirical Investigation.” Journal of Banking & Finance 27 (6): 1079–110. https://doi.org/10.1016/s0378-4266(02)00246-7.Search in Google Scholar

Simon, D. 1990. “Expectations and the Treasury Bill-Federal Funds Rate Spread over Recent Monetary Policy Regimes.” Journal of Finance 45 (2): 567–77. https://doi.org/10.1111/j.1540-6261.1990.tb03703.x.Search in Google Scholar

Smith, R., B. Dike, and S. Stegmann. 1995. “Fitness Inheritance in Genetic Algorithms.” In Proceedings of the 1995 ACM Symposium on Applied Computing, 345–50.10.1145/315891.316014Search in Google Scholar

Spanos, A. 1994. “On Modeling Heteroskedasticity: the Student’s t and Elliptical Linear Regression Models.” Econometric Theory 10 (2): 286–315. https://doi.org/10.1017/s0266466600008422.Search in Google Scholar

Virolainen, S. 2021. “uGMAR: Estimate Univariate Gaussian or Student’s t Mixture Autoregressive Model.” In R Package Version 3.3.1. Also available at CRAN: https://CRAN.R-project.org/package=uGMAR.Search in Google Scholar

Wong, C., W. Chan, and P. Kam. 2009. “A Student’s t-Mixture Autoregressive Model with Applications to Heavy-Tailed Financial Data.” Biometrika 96 (3): 751–60. https://doi.org/10.1093/biomet/asp031.Search in Google Scholar

Wong, C., and W. Li. 2000. “On Mixture Autoregressive Model.” Journal of the Royal Statistical Society 62 (1): 95–115. https://doi.org/10.1111/1467-9868.00222.Search in Google Scholar

Wong, C., and W. Li. 2001a. “On a Mixture Autoregressive Conditional Heteroskedastic Model.” Journal of the American Statistical Association 96 (455): 982–95. https://doi.org/10.1198/016214501753208645.Search in Google Scholar

Wong, C., and W. Li. 2001b. “On Logistic Mixture Autoregressive Model.” Biometrika 88 (3): 833–46. https://doi.org/10.1093/biomet/88.3.833.Search in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/snde-2020-0060).

Received: 2020-05-22

Accepted: 2021-07-02

Published Online: 2021-07-15

You are currently not able to access this content.

Supplementary Material Details

Articles in the same Issue

https://doi.org/10.1515/snde-2020-0060

Keywords for this article

interest rate spread; mixture model; nonlinear autoregression; regime switching