The Story of a Model: The First-Order Diagonal Bilinear Autoregression

Philip Hans Franses

doi:10.1515/jem-2024-0025

Enjoy 40% off

academic books on De Gruyter Brill *

Article Open Access

The Story of a Model: The First-Order Diagonal Bilinear Autoregression

Philip Hans Franses

Published/Copyright: January 10, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Journal of Econometric Methods Volume 14 Issue 1

Abstract

This paper deals with a detailed analysis of the first-order diagonal bilinear time series model, first proposed in Granger and Andersen (1978. An Introduction to Bilinear Time Series Models. Göttingen: Vandenhoeck & Ruprecht). This model allows for sequences of “outliers” in the data. We show that the model has a variety of features that we can observe in practice, while we also document that the bilinear features show up in just a limited number of observations. When the moment restrictions are close, parameter estimation becomes difficult. When the parameters are further away from the moment restrictions, parameter estimation is easy. Yet, in those latter cases, approximative linear models appear to generate equally accurate fit and forecasts. In sum, in cases of proper inference on a bilinear model, the model is barely relevant for forecasting.

Keywords: first order diagonal bilinear time series model; estimation; forecasting

JEL Classification: C12; C22

I think bilinear models are not going to have much future. I do not see much evidence of them helping forecasting, for example.

C.W.J. Granger (1997), in the ET Interview, Econometric Theory, 13, 253–303.

It is well known that estimating bilinear models is quite challenging. Many different ideas have been proposed to solve this. However, there is not a simple way to do inference even for its simple cases.

Ling, Peng, and Zhu (2015, abstract)

1 Introduction

This paper deals with a detailed analysis of the first-order diagonal bilinear time series model, first proposed in Granger and Andersen (1978). This model allows for sequences of “outliers” in the data, and this can be useful for such series as unemployment and inflation. A sudden large-valued observation creates a large forecast error, and this forecast error amplifies the large-valued observation to create a new large-valued one. We will see that the model has a variety of features that we can observe in practice.

The main theme of this paper is to see whether the above claim by its creator can be further substantiated. The bilinear features show up in just a small number of observations and this makes parameter estimation difficult, and sometimes even impossible. Next, we will see that only when the bilinear features are such that moment restrictions are close to being violated that then linear models have larger forecast errors. When the bilinear parameters are further away from the moment restrictions, parameters can simply be estimated, but then linear models perform about just as well. In short, in cases of proper inference on a bilinear model, the model is barely relevant for fit and forecasting.

The outline paper of the paper is as follows. In Section 2 we introduce the first-order diagonal bilinear time series model and compare it against a linear time series model to highlight its features. Section 3 addresses the properties of the model, which are quite impressive. It can generate data with properties that are often seen in practice. Section 4 discusses the potential problems with this model, which concern parameter estimation and forecasting. Section 5 presents a conclusion, which in short comes close to the opening quote by Granger.

2 The First-Order Diagonal Bilinear Time Series Model

In time series econometrics one seeks to design models that capture the salient features of time series, and as such, use the models to create accurate fit and out-of-sample forecasts. A basic class of models covers the familiar linear autoregressive moving average (ARMA) model. An example of such a model for a time series y _t, t = 1, 2, …, T is the ARMA(1,1) model given by

(1) y t = μ + α y t − 1 + ε t + θ ε t − 1

with ɛ _t is a zero mean uncorrelated process with common variance σ ². Figure 1 displays an illustrative graph of how data can look like if they are generated according to an ARMA(1,1) scheme.

$Figure 1: An ARMA process, with α = 0.8, θ = −0.3, and ε t ∼ N 0,1 ${\varepsilon }_{t}\sim N\left(0,1\right)$ .$

Figure 1:

An ARMA process, with α = 0.8, θ = −0.3, and ε t ∼ N 0,1 .

Autocorrelations can help to identify this model. The theoretical autocorrelations are

(2) ρ 1 = 1 + α θ α + θ 1 + 2 α θ + θ 2

and

(3) ρ k = α ρ k − 1 for k = 2,3 , …

If we see such a pattern for actual data, we may decide to fit this model, and estimate its parameters.

A forecast from an ARMA(1,1) process for T + 1 created at time T is based on

(4) y T + 1 | T = μ + α y T + θ ε T

This is because the expected value of ɛ _T+1|T is equal to zero. In practice, we replace the true parameters by their estimates.

The data in Figure 1 do not display obvious salient features, but many time series in practice do. Consider for example the changes in quarterly unemployment rates in Figure 2, which obviously looks different than the one in Figure 1. We see tranquil periods (expansions), and periods with a sequence of larger and positive values (recessions). It is of course the latter set of observations that are of interest to describe and even to predict. The first-order diagonal time series model can do that, at least in principle.

Figure 2:

First differences of the unemployment rate (USA, quarterly, 1950Q1–2012Q4, seasonally adjusted).

Consider for t = 1, 2, …, T, the so-called first-order diagonal bilinear model for a time series y _t, that is,

(5) y t = α y t − 1 + β ε t − 1 y t − 1 + ε t

with ε t ∼ N 0 , σ 2 , which is proposed in Granger and Andersen (1978, page 49 onwards). When ɛ _t−1 is large and positive, this means that the forecast error for time t − 1 was large, meaning that y _t−1 relative to what was predicted from t − 2 was large. So, we have an outlier. With βɛ _t−1 y _t−1 and a positive value of β, this gets amplified towards y _t. So, we may see two or more large “outlying” observations in a row. So, in a sense, this model can predict sequences of outliers, or at least, the second, third, and further. It can also happen that ɛ _t−1 is small, and then the bilinear feature is difficult to spot. Note that the model in (5) can be written in a random coefficient form, that is,

(6) y t = α + β ε t − 1 y t − 1 + ε t

where the random coefficient is α + βɛ _t−1.

The one-step-ahead forecast from the bilinear model is given by

(7) y T + 1 | T = α y T + β ε T y T

As compared with the ARMA(1,1) forecast in (4), the forecast in (7) replaces θ by βy _T. This shows that, at least in theory, there is an opportunity to predict an outlier observation at T + 1 if there is an outlier at origin T. Turkman and Turkman (1997) derive the properties of the extreme observations corresponding to bilinear time series models.

All this is visualized in Figure 3, where we depict 200 observations from y _t = 0.6y _t−1 + 0.6ɛ _t−1 y _t−1 + ɛ _t with ε t ∼ N 0,1 , and y ₁ = 0, and also from y _t = 0.6y _t−1 + ɛ _t with ε t ∼ N 0,1 , and y ₁ = 0. Figure 4 displays the one-period-lagged residuals of fitting a linear first-order autoregression (hence ignoring outliers) to the bilinear data against the observations. Here we see the link between large residuals and large values for the observations.

$Figure 3: Example data, 200 observations from y t = 0.6y t−1 + 0.6ɛ t−1 y t−1 + ɛ t with ε t ∼ N 0,1 ${\varepsilon }_{t}\sim N\left(0,1\right)$ , and y 1 = 0 (labeled as Y), and also from y t = 0.6y t−1 + ɛ t with ε t ∼ N 0,1 ${\varepsilon }_{t}\sim N\left(0,1\right)$ , and y 1 = 0 (labelled as YAR).$

Figure 3:

Example data, 200 observations from y _t = 0.6y _t−1 + 0.6ɛ _t−1 y _t−1 + ɛ _t with ε t ∼ N 0,1 , and y ₁ = 0 (labeled as Y), and also from y _t = 0.6y _t−1 + ɛ _t with ε t ∼ N 0,1 , and y ₁ = 0 (labelled as YAR).

$Figure 4: Estimated residuals u ̂ t ${\hat{u}}_{t}$ when y t = π + δy t−1 + u t is fitted to data in Figure 1, which are created from y t = 0.6y t−1 + 0.6ɛ t−1 y t−1 + ɛ t with ε t ∼ N 0,1 ${\varepsilon }_{t}\sim N\left(0,1\right)$ , and y 1 = 0.$

Figure 4:

Estimated residuals u ̂ t when y _t = π + δy _t−1 + u _t is fitted to data in Figure 1, which are created from y _t = 0.6y _t−1 + 0.6ɛ _t−1 y _t−1 + ɛ _t with ε t ∼ N 0,1 , and y ₁ = 0.

3 Properties of the First-Order Diagonal Bilinear Time Series Model

In this section we analyze the properties of data that match with the first-order diagonal time series model. We will see that this model allows for remarkable properties of the data.

3.1 Moments

Consider again the first-order diagonal bilinear model for a time series y _t, that is,

(8) y t = α y t − 1 + β ε t − 1 y t − 1 + ε t

and assume that ε t ∼ N 0 , σ 2 . The following results are found in various source cited in the references, most notably in Granger and Andersen (1978).

First, the unconditional mean μ _y is given by

(9) μ y = β σ 2 1 − α

This shows that the model does not need an extra intercept, although in practice it shall not harm to include one. This expression further shows that for the first moment to exist, the restriction α < 1 must hold. When β = 0, this unconditional mean equals 0. When β > 0 and α < 1 , μ _y > 0, and when β < 0 and α < 1 , μ _y < 0,

The unconditional variance γ y 0 is given by

(10) γ y 0 = 1 + 2 β 2 σ 2 + 4 α β μ y σ 2 1 − α 2 − β 2 σ 2 − μ y 2

This expression shows that for the second moment to exist, the restriction

(11) α 2 + β 2 σ 2 < 1

must hold. When β = 0, we have that

(12) γ y 0 = σ 2 1 − α 2

which is the familiar expression for an AR(1) model. Figure 3 displays that the variance of a bilinear process is (much) larger than that of an AR(1) process.

Given the link between large past forecast errors and large-valued observations, we may appreciate that the skewness of the observations is not zero and, depending on the sign of β, it is positive or negative. Additionally, the kurtosis is larger than 3. So, the data display non-normality. This is further seen from the following. If this is the model

(13) y t = α y t − 1 + β ε t − 1 y t − 1 + ε t

then it holds for the lagged error term that

(14) ε t − 1 = y t − 1 − α y t − 2 − β ε t − 2 y t − 2

Plugging this expression into the previous expression gives

(15) y t = α y t − 1 + β y t − 1 2 − α β y t − 1 y t − 2 − β 2 y t − 1 y t − 2 ε t − 2 + ε t

This shows that y _t is in part driven by y t − 1 2 and y _t−1 y _t−2, hence explaining the large kurtosis.

Sesay and Subba Rao (1988) and Kim, Billard, and Basawa (1990) established that for the third moment to exist, the restriction

(16) α 3 + 3 α β 2 σ 2 < 1

must hold, and for the fourth moment to exist, the restriction

(17) α 4 + 6 α 2 β 2 σ 2 + 3 β 4 σ 4 < 1

must hold. The consequences of these properties are also reflected in Figure 4, where we present the estimated residuals u ̂ t when y _t = π + δy _t−1 + u _t is fitted to data in Figure 1, which are created from y _t = 0.6y _t−1 + 0.6ɛ _t−1 y _t−1 + ɛ _t with ε t ∼ N 0,1 , and with y ₁ = 0. The mean of the squared estimated residuals u ̂ t is 4.831, where the estimated median is 0.96. Figure 4 also shows that residuals can be large, and at first sight can be viewed as outliers. Figure 5 emphasizes the link between large past estimated residuals, from a linear AR(1) regression, versus current observations. A similar graph as in Figure 5 appears in Figure 6, but now for the data on changes in unemployment.

$Figure 5: A scatter plot of y t is fitted against the estimated residuals u ̂ t ${\hat{u}}_{t}$ when y t = π + δy t−1 + u t is fitted to data in Figure 1, which are created from y t = 0.6y t−1 + 0.6ɛ t−1 y t−1 + ɛ t with ε t ∼ N 0,1 ${\varepsilon }_{t}\sim N\left(0,1\right)$ , and y 1 = 0.$

Figure 5:

A scatter plot of y _t is fitted against the estimated residuals u ̂ t when y _t = π + δy _t−1 + u _t is fitted to data in Figure 1, which are created from y _t = 0.6y _t−1 + 0.6ɛ _t−1 y _t−1 + ɛ _t with ε t ∼ N 0,1 , and y ₁ = 0.

$Figure 6: A scatter plot of changes in unemployment against the estimated residuals u ̂ t ${\hat{u}}_{t}$ when y t = π + δy t−1 + u t is fitted to data in Figure 2.$

Figure 6:

A scatter plot of changes in unemployment against the estimated residuals u ̂ t when y _t = π + δy _t−1 + u _t is fitted to data in Figure 2.

Figures 7–9 depict the moment restrictions, and we see that the parameter values for β cannot be large, as the restrictions are quite stringent. We will return to the content of these graphs in the next section.

$Figure 7: Moment restrictions for y t = 0.6y t−1 + βɛ t−1 y t−1 + ɛ t with ε t ∼ N 0,1 ${\varepsilon }_{t}\sim N\left(0,1\right)$ . These are not violated for β = 0.79, β = 0.65, and β = 0.45, respectively.$

Figure 7:

Moment restrictions for y _t = 0.6y _t−1 + βɛ _t−1 y _t−1 + ɛ _t with ε t ∼ N 0,1 . These are not violated for β = 0.79, β = 0.65, and β = 0.45, respectively.

$Figure 8: Moment restrictions for y t = 0.8y t−1 + βɛ t−1 y t−1 + ɛ t with ε t ∼ N 0,1 ${\varepsilon }_{t}\sim N\left(0,1\right)$ . These are not violated for β = 0.59, β = 0.46, and β = 0.38, respectively.$

Figure 8:

Moment restrictions for y _t = 0.8y _t−1 + βɛ _t−1 y _t−1 + ɛ _t with ε t ∼ N 0,1 . These are not violated for β = 0.59, β = 0.46, and β = 0.38, respectively.

$Figure 9: Moment restrictions for y t = 0.6y t−1 + βɛ t−1 y t−1 + ɛ t with ε t ∼ N 0,2 ${\varepsilon }_{t}\sim N\left(0,2\right)$ . These are not violated for β = 0.39, β = 0.32, and β = 0.26, respectively.$

Figure 9:

Moment restrictions for y _t = 0.6y _t−1 + βɛ _t−1 y _t−1 + ɛ _t with ε t ∼ N 0,2 . These are not violated for β = 0.39, β = 0.32, and β = 0.26, respectively.

3.2 Autocorrelations

The unconditional autocovariances are

(18) γ y 1 = α γ y 0 + 1 − α μ y 2

and

(19) γ y k = α k − 1 γ y 1 for k > 1

Hence, the autocorrelation function is

(20) ρ y j = γ y j γ y 0 for j = 1,2 , …

and it is the same as the autocorrelation function of an ARMA(1,1) model. In fact, Granger and Andersen (1978, page 56) show that this ARMA(1,1) model is

(21) y t = τ + α y t − 1 + u t + θ u t − 1

with the same α as in the diagonal bilinear model. Basrak, David and Mikosch (1999) discuss the empirical autocorrelation function.

Furthermore, Granger and Andersen (1978, page 55) (see also Kim, Billard, and Basawa 1990; Sesay and Subba Rao 1988) show that the autocorrelations of y t 2 have a pattern that also mimic that of an ARMA(1,1) process. In other words, the bilinear model matches with data that have equivalent properties as Generalized Autoregressive Conditional Heteroskedasticity (GARCH), see Weiss (1986).

4 Difficulties and Issues

In this section we highlight the two main problematic issues with the model reviewed in the previous section. The first is that bilinear features appear in just a few observations. This makes estimation of the parameters cumbersome. Also, at the same time, when the true parameters are getting closer to the moment restrictions, the estimation procedure becomes intractable. In other words, the parameters in the bilinear model can best be estimated when they are small. However, when the parameters are small, the fit and forecast gain of the bilinear model versus for example a linear AR(1) is negligible.

4.1 Estimation

There are many studies on the estimation of the parameters of the first-order diagonal bilinear model, and various other bilinear models. Charemza, Lifshits, and Makarova (2005) study the case where α = 1. Guegan and Pham (1989) discuss the estimation of the parameters using the least squares method. A method of moments estimator for the above diagonal model is put forward by Kim, Billard, and Basawa (1990). Pham and Tran (1981) discuss various properties of the first-order bilinear time series model. Sesay and Subba Rao (1988) consider estimation methods using higher order moments, and Subba Rao (1981) provides a general theory of bilinear models. Finally, Brunner and Hess (1995) discuss the potential problems with the likelihood function of the first-order bilinear model.

The main conclusion from the literature so far is that it is not easy to properly estimate the parameters of a first-order diagonal time series model. And, these problems rapidly increase when the parameters are closer to the boundaries of the moment restrictions, see Brunner and Hess (1995).

A simple method to estimate the parameters in the first-order diagonal model is based on a two-step method. Recall that the autocorrelations of the first-order diagonal model mimic that of an ARMA(1,1) model where the parameter α is the same. The idea is now to estimate α using Maximum Likelihood from

(22) y t = τ + α y t − 1 + u t + θ u t − 1

and in a second step β using Ordinary Least Squares (OLS) from

(23) y t − α ̂ y t − 1 = β u ̂ t − 1 y t − 1 + ε t

When this method is applied to the changes in unemployment data, we obtain

α ̂ = 0.537 0.063 β ̂ = 0.197 0.084

with standard errors in parentheses. Another method is to estimate α and β from a shortened version of (15), that is

(24) y t = α y t − 1 + β y t − 1 2 − α β y t − 1 y t − 2 + ε t

Next, use Nonlinear Least Squares (NLS) and use the obtained ε ̂ t , and include these for ɛ _t in (13), and then use OLS.

To see if the first estimation method is dependable, we generate one thousand series of length 200 from a first-order diagonal model with true parameters α = 0.6 and β = 0.2. The averages of the one thousand estimated parameters are α ̄ = 0.57 and β ̄ = 0.16 , respectively. Table 1 reports on simulation results for this estimation method and the one based on NLS. We see that for reasonable parameter figurations, the parameters can well be estimated.

Table 1:

Simulation results on two OLS based estimation methods, based on one thousand replications. Sample size is 200.

α	β	Mean α ̂	Mean β ̂	Mean σ ̂ 2	Median σ ̂ 2
ARMA(1,1) based
0.8	0.2	0.77	0.13	1.11	1.08
0.6	0.2	0.57	0.16	1.03	1.02
0.4	0.2	0.37	0.18	1.03	1.02
Regression based using NLS
0.8	0.2	0.79	0.14	1.08	1.06
0.6	0.2	0.58	0.18	1.00	0.99
0.4	0.2	0.39	0.19	0.98	0.98

Potential problems with estimating the parameters of a first-order diagonal bilinear model can be caused by the fact that the nonlinear features of the data appear through only a few data points. This is visualized in Figure 10, where we present the influence statistics from the regression y t = π + δ y t − 1 + γ y t − 1 2 + ω y t − 1 y t − 2 + u t when it is fitted to data in Figure 1, which are created from y _t = 0.6y _t−1 + 0.6ɛ _t−1 y _t−1 + ɛ _t with ε t ∼ N 0,1 , and y ₁ = 0. We see that there are only a few “outlying” observations. Such outliers are more prominent the closer the violation is of the moment restrictions. Figure 11 shows something similar for the changes in unemployment rates. Only during a few periods which associate with recessions we see that the nonlinearity is evident. The success of an estimation routine draws fully on these few observations.

$Figure 10: Influence statistics from the regression y t = π + δ y t − 1 + γ y t − 1 2 + ω y t − 1 y t − 2 + u t ${y}_{t}=\pi +\delta {y}_{t-1}+\gamma {y}_{t-1}^{2}+\omega {y}_{t-1}{y}_{t-2}+{u}_{t}$ when it is fitted to data in Figure 1, which are created from y t = 0.6y t−1 + 0.6ɛ t−1 y t−1 + ɛ t with ε t ∼ N 0,1 ${\varepsilon }_{t}\sim N\left(0,1\right)$ , and y 1 = 0.$

Figure 10:

Influence statistics from the regression y t = π + δ y t − 1 + γ y t − 1 2 + ω y t − 1 y t − 2 + u t when it is fitted to data in Figure 1, which are created from y _t = 0.6y _t−1 + 0.6ɛ _t−1 y _t−1 + ɛ _t with ε t ∼ N 0,1 , and y ₁ = 0.

$Figure 11: Influence statistics for Influence statistics from the regression y t = π + δ y t − 1 + γ y t − 1 2 + ω y t − 1 y t − 2 + u t ${y}_{t}=\pi +\delta {y}_{t-1}+\gamma {y}_{t-1}^{2}+\omega {y}_{t-1}{y}_{t-2}+{u}_{t}$ with y the changes in unemployment.$

Figure 11:

Influence statistics for Influence statistics from the regression y t = π + δ y t − 1 + γ y t − 1 2 + ω y t − 1 y t − 2 + u t with y the changes in unemployment.

4.2 Fit and Forecasting

There are only a few studies where bilinear models are considered for forecasting. Examples are Poskitt and Tremayne (1986) and Weiss (1986), and there it is found for a few cases that bilinear models can slightly improve on linear models.

It seems that if the parameters are further away from the moment violation cases, ignoring the bilinear part does not matter much for forecasting. The simulation results in Table 2 support this notion. We generate one thousand time series of length 200 for a first-order diagonal bilinear model, with parameter values obeying the moment restrictions visualized in Figures 7 to 9. Each time, we choose for parameter configurations that are far away from the boundaries. We estimate either an AR(1) or an ARMA(1,1) model, and we report the mean and median values of the σ ̂ 2 for these linear models. The true value is 1 in each simulation. These σ ̂ 2 values reflect the in-sample static one-step-ahead prediction errors.

Table 2:

In-sample static one-step-ahead forecast error variance when fitting an AR(1) or an ARMA(1,1) model to 200 observations generated from y _t = αy _t−1 + βɛ _t−1 y _t−1 + ɛ _t with ε t ∼ N 0,1 , and y ₁ = 0, based on 1,000 replications.

α	β	AR(1)		ARMA(1,1)
		Mean σ ̂ 2	Median σ ̂ 2	Mean σ ̂ 2	Median σ ̂ 2
0.8	0.2	1.22	1.19	1.18	1.19
0.6	0.2	1.11	1.09	1.10	1.08
0.4	0.2	1.08	1.08	1.08	1.07
0.6	−0.2	1.11	1.10	1.10	1.09

From Table 2 we see that for various parameter configurations, whether it is an AR(1) or an ARMA(1,1) model, the differences between in-sample prediction errors are around 1.10. Hence, even without estimating the parameters in the bilinear model, the potential forecast gain of a bilinear model is something like 10 % at max. This becomes smaller when the parameters are estimated, see Table 1.

Turning back to the illustration involving the changes in unemployment, we see that the in-sample Root Mean Squared Error (RMSE) and the Mean Absolute Error (MAE) for the bilinear model are 0.296 and 0.212, respectively. For an AR(1) model fitted to the same data we obtain an RMSE of 0.299 and a MAE of 0.216. Hence, the differences are negligible.

Finally, we zoom in on a few specific observations in Table 3, recommended by van Dijk and Franses (2003), to see if the bilinear model can indeed better forecast “outliers”. Looking at various recession periods, we see that the differences between the in-sample forecasts between the bilinear model and the linear AR(1) model are again negligible.

Table 3:

Forecasts for specific periods for changes in unemployment rate.

Quarter	True	Bilinear		AR(1)
		Forecast	Error	Forecast	Error
1953Q4	1.0	0.06	0.94	0.07	0.93
1954Q1	1.6	0.71	0.89	0.63	0.97
1954Q2	0.5	1.12	−0.62	1.01	−0.51
1957Q4	0.7	0.05	0.65	0.07	0.63
1957Q1	1.4	0.46	0.94	0.44	0.96
1958Q1	1.1	0.99	0.11	0.88	0.22
1974Q3	0.4	0.05	0.35	0.07	0.33
1974Q4	1.0	0.24	0.76	0.26	0.74
1975Q1	1.7	0.67	1.03	0.63	1.07
1980Q1	0.3	0.05	0.25	0.07	0.23
1980Q2	1.0	0.17	0.83	0.19	0.81
1980Q3	0.4	0.69	−0.29	0.63	−0.23
1981Q4	0.8	0.00	0.80	0.00	0.80
1982Q1	0.6	0.55	0.05	0.51	0.09
1982Q2	0.6	0.33	0.27	0.38	0.22
1982Q3	0.5	0.35	0.15	0.38	0.12
1982Q4	0.8	0.28	0.52	0.32	0.48
2008Q3	0.7	0.17	0.53	0.19	0.51
2008Q4	0.9	0.44	0.46	0.44	0.46
2009Q1	1.4	0.56	0.84	0.57	0.83
2009Q2	1.0	0.97	0.03	0.88	0.12

5 Conclusions

In this paper we provided a detailed analysis of the first-order diagonal bilinear time series model. This model has remarkable features, one of which is that it allows for a sequence of “outliers”, and hence may predict the value of the next “outlier”. Despite its beauty, we showed that this bilinear model will unlikely be successful for in-sample fit and for out-of-sample prediction of economic time series. This is because the bilinear features show up in just a small number of observations. In turn, this makes parameter estimation difficult, and sometimes even impossible. We also saw that when moment restrictions are far from being violated, that then approximative linear models have equivalent forecast errors. In sum, when the parameters fit well to the moment restrictions, the parameters can simply be estimated, but then we saw that linear approximative models perform about just as good.

In short, in cases when we can properly draw inference from a bilinear model, it is barely relevant for more accurate forecasting, not in the full sample, nor in specific “outlier-like” cases. It seems thus fair to conclude that the bilinear model is beautiful, but not especially useful for forecasting. So, its creator Clive Granger is right.

Corresponding author: Philip Hans Franses, Erasmus School of Economics, Econometric Institute, POB 1738, 3000 DR Rotterdam, The Netherlands, E-mail: franses@ese.eur.nl

References

Basrak, B., R. A. David, and T. Mikosch. 1999. “The Sample ACF of a Simple Bilinear Process.” Stochastic Processes and their Applications 83 (1): 1–14. https://doi.org/10.1016/s0304-4149(99)00013-7.Search in Google Scholar

Brunner, A. D., and G. D. Hess. 1995. “Potential Problems in Estimating Bilinear Time-Series Models.” Journal of Economic Dynamics and Control 19 (4): 663–81. https://doi.org/10.1016/0165-1889(94)00798-m.Search in Google Scholar

Charemza, W. W., M. Lifshits, and S. Makarova. 2005. “Conditional Testing for Unit-Root Bilinearity in Financial Time Series: Some Theoretical and Empirical Results.” Journal of Economic Dynamics and Control 29 (1): 63–96. https://doi.org/10.1016/j.jedc.2003.07.001.Search in Google Scholar

Granger, C. W. J., and A. P. Andersen. 1978. An Introduction to Bilinear Time Series Models. Göttingen: Vandenhoeck & Ruprecht.Search in Google Scholar

Guegan, D., and D. T. Pham. 1989. “A Note on the Estimation of the Parameters of the Diagonal Bilinear Model by the Method of Least Squares.” Scandinavian Journal of Statistics 16 (2): 129–36.Search in Google Scholar

Kim, W. K., L. Billard, and I. V. Basawa. 1990. “Estimation for the First Order Diagonal Bilinear Time Series Model.” Journal of Time Series Analysis 11 (3): 215–27. https://doi.org/10.1111/j.1467-9892.1990.tb00053.x.Search in Google Scholar

Ling, S., L. Peng, and F. Zhu. 2015. “Inference for a Special Bilinear Time Series Model.” Journal of Time Series Analysis 36 (1): 61–6. https://doi.org/10.1111/jtsa.12092.Search in Google Scholar

Pham, D. T., and L. T. Tran. 1981. “On the First Order Bilinear Time Series Model.” Journal of Applied Probability 18 (3): 617–27. https://doi.org/10.1017/s0021900200098417.Search in Google Scholar

Poskitt, D. S., and A. R. Tremayne. 1986. “The Selection and Use of Linear and Bilinear Time Series Models.” International Journal of Forecasting 2 (1): 101–14. https://doi.org/10.1016/0169-2070(86)90033-6.Search in Google Scholar

Sesay, S., and T. Subba Rao. 1988. “Yule-Walker Type Difference Equations for Higher Order Moments and Cumulants for Bilinear Time Series Models.” Journal of Time Series Analysis 9 (4): 385–401. https://doi.org/10.1111/j.1467-9892.1988.tb00478.x.Search in Google Scholar

Subba Rao, T. 1981. “On the Theory of Bilinear Models.” Journal of the Royal Statistical Society B 43 (2): 244–55. https://doi.org/10.1111/j.2517-6161.1981.tb01177.x.Search in Google Scholar

Turkman, K. F., and M. A. A. Turkman. 1997. “Extremes of Bilinear Time Series Models.” Journal of Time Series Analysis 18 (3): 305–19. https://doi.org/10.1111/1467-9892.00051.Search in Google Scholar

Van Dijk, D., and P. H. Franses. 2003. “Selecting a Nonlinear Time Series Model Using Weighted Tests of Equal Forecast Accuracy.” Oxford Bulletin of Economics & Statistics 65 (S1): 727–44. https://doi.org/10.1046/j.0305-9049.2003.00091.x.Search in Google Scholar

Weiss, A. A. 1986. “ARCH and Bilinear Time Series Models: Comparison and Combination.” Journal of Business & Economic Statistics 4 (1): 59–70. https://doi.org/10.2307/1391387.Search in Google Scholar

Received: 2024-07-23

Accepted: 2025-01-02

Published Online: 2025-01-10

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

Frontmatter
Research Articles
The Story of a Model: The First-Order Diagonal Bilinear Autoregression
Maximum Likelihood Estimation of Regression Effects in State Space Models
Software
QR.break: An R Package for Structural Breaks in Quantile Regression
Practitioner's Corner
Fast Algorithms for Quantile Regression with Selection

https://doi.org/10.1515/jem-2024-0025

Keywords for this article

first order diagonal bilinear time series model; estimation; forecasting

Creative Commons

BY 4.0

Articles in the same Issue

Frontmatter
Research Articles
The Story of a Model: The First-Order Diagonal Bilinear Autoregression
Maximum Likelihood Estimation of Regression Effects in State Space Models
Software
QR.break: An R Package for Structural Breaks in Quantile Regression
Practitioner's Corner
Fast Algorithms for Quantile Regression with Selection