Abstract
This paper deals with a detailed analysis of the first-order diagonal bilinear time series model, first proposed in Granger and Andersen (1978. An Introduction to Bilinear Time Series Models. Göttingen: Vandenhoeck & Ruprecht). This model allows for sequences of “outliers” in the data. We show that the model has a variety of features that we can observe in practice, while we also document that the bilinear features show up in just a limited number of observations. When the moment restrictions are close, parameter estimation becomes difficult. When the parameters are further away from the moment restrictions, parameter estimation is easy. Yet, in those latter cases, approximative linear models appear to generate equally accurate fit and forecasts. In sum, in cases of proper inference on a bilinear model, the model is barely relevant for forecasting.
I think bilinear models are not going to have much future. I do not see much evidence of them helping forecasting, for example.
C.W.J. Granger (1997), in the ET Interview, Econometric Theory, 13, 253–303.
It is well known that estimating bilinear models is quite challenging. Many different ideas have been proposed to solve this. However, there is not a simple way to do inference even for its simple cases.
Ling, Peng, and Zhu (2015, abstract)
1 Introduction
This paper deals with a detailed analysis of the first-order diagonal bilinear time series model, first proposed in Granger and Andersen (1978). This model allows for sequences of “outliers” in the data, and this can be useful for such series as unemployment and inflation. A sudden large-valued observation creates a large forecast error, and this forecast error amplifies the large-valued observation to create a new large-valued one. We will see that the model has a variety of features that we can observe in practice.
The main theme of this paper is to see whether the above claim by its creator can be further substantiated. The bilinear features show up in just a small number of observations and this makes parameter estimation difficult, and sometimes even impossible. Next, we will see that only when the bilinear features are such that moment restrictions are close to being violated that then linear models have larger forecast errors. When the bilinear parameters are further away from the moment restrictions, parameters can simply be estimated, but then linear models perform about just as well. In short, in cases of proper inference on a bilinear model, the model is barely relevant for fit and forecasting.
The outline paper of the paper is as follows. In Section 2 we introduce the first-order diagonal bilinear time series model and compare it against a linear time series model to highlight its features. Section 3 addresses the properties of the model, which are quite impressive. It can generate data with properties that are often seen in practice. Section 4 discusses the potential problems with this model, which concern parameter estimation and forecasting. Section 5 presents a conclusion, which in short comes close to the opening quote by Granger.
2 The First-Order Diagonal Bilinear Time Series Model
In time series econometrics one seeks to design models that capture the salient features of time series, and as such, use the models to create accurate fit and out-of-sample forecasts. A basic class of models covers the familiar linear autoregressive moving average (ARMA) model. An example of such a model for a time series y t , t = 1, 2, …, T is the ARMA(1,1) model given by
with ɛ t is a zero mean uncorrelated process with common variance σ 2. Figure 1 displays an illustrative graph of how data can look like if they are generated according to an ARMA(1,1) scheme.

An ARMA process, with α = 0.8, θ = −0.3, and
Autocorrelations can help to identify this model. The theoretical autocorrelations are
and
If we see such a pattern for actual data, we may decide to fit this model, and estimate its parameters.
A forecast from an ARMA(1,1) process for T + 1 created at time T is based on
This is because the expected value of ɛ T+1|T is equal to zero. In practice, we replace the true parameters by their estimates.
The data in Figure 1 do not display obvious salient features, but many time series in practice do. Consider for example the changes in quarterly unemployment rates in Figure 2, which obviously looks different than the one in Figure 1. We see tranquil periods (expansions), and periods with a sequence of larger and positive values (recessions). It is of course the latter set of observations that are of interest to describe and even to predict. The first-order diagonal time series model can do that, at least in principle.

First differences of the unemployment rate (USA, quarterly, 1950Q1–2012Q4, seasonally adjusted).
Consider for t = 1, 2, …, T, the so-called first-order diagonal bilinear model for a time series y t , that is,
with
where the random coefficient is α + βɛ t−1.
The one-step-ahead forecast from the bilinear model is given by
As compared with the ARMA(1,1) forecast in (4), the forecast in (7) replaces θ by βy T . This shows that, at least in theory, there is an opportunity to predict an outlier observation at T + 1 if there is an outlier at origin T. Turkman and Turkman (1997) derive the properties of the extreme observations corresponding to bilinear time series models.
All this is visualized in Figure 3, where we depict 200 observations from y
t
= 0.6y
t−1 + 0.6ɛ
t−1
y
t−1 + ɛ
t
with

Example data, 200 observations from y
t
= 0.6y
t−1 + 0.6ɛ
t−1
y
t−1 + ɛ
t
with

Estimated residuals
3 Properties of the First-Order Diagonal Bilinear Time Series Model
In this section we analyze the properties of data that match with the first-order diagonal time series model. We will see that this model allows for remarkable properties of the data.
3.1 Moments
Consider again the first-order diagonal bilinear model for a time series y t , that is,
and assume that
First, the unconditional mean μ y is given by
This shows that the model does not need an extra intercept, although in practice it shall not harm to include one. This expression further shows that for the first moment to exist, the restriction
The unconditional variance
This expression shows that for the second moment to exist, the restriction
must hold. When β = 0, we have that
which is the familiar expression for an AR(1) model. Figure 3 displays that the variance of a bilinear process is (much) larger than that of an AR(1) process.
Given the link between large past forecast errors and large-valued observations, we may appreciate that the skewness of the observations is not zero and, depending on the sign of β, it is positive or negative. Additionally, the kurtosis is larger than 3. So, the data display non-normality. This is further seen from the following. If this is the model
then it holds for the lagged error term that
Plugging this expression into the previous expression gives
This shows that y
t
is in part driven by
Sesay and Subba Rao (1988) and Kim, Billard, and Basawa (1990) established that for the third moment to exist, the restriction
must hold, and for the fourth moment to exist, the restriction
must hold. The consequences of these properties are also reflected in Figure 4, where we present the estimated residuals

A scatter plot of y
t
is fitted against the estimated residuals

A scatter plot of changes in unemployment against the estimated residuals
Figures 7–9 depict the moment restrictions, and we see that the parameter values for β cannot be large, as the restrictions are quite stringent. We will return to the content of these graphs in the next section.

Moment restrictions for y
t
= 0.6y
t−1 + βɛ
t−1
y
t−1 + ɛ
t
with

Moment restrictions for y
t
= 0.8y
t−1 + βɛ
t−1
y
t−1 + ɛ
t
with

Moment restrictions for y
t
= 0.6y
t−1 + βɛ
t−1
y
t−1 + ɛ
t
with
3.2 Autocorrelations
The unconditional autocovariances are
and
Hence, the autocorrelation function is
and it is the same as the autocorrelation function of an ARMA(1,1) model. In fact, Granger and Andersen (1978, page 56) show that this ARMA(1,1) model is
with the same α as in the diagonal bilinear model. Basrak, David and Mikosch (1999) discuss the empirical autocorrelation function.
Furthermore, Granger and Andersen (1978, page 55) (see also Kim, Billard, and Basawa 1990; Sesay and Subba Rao 1988) show that the autocorrelations of
4 Difficulties and Issues
In this section we highlight the two main problematic issues with the model reviewed in the previous section. The first is that bilinear features appear in just a few observations. This makes estimation of the parameters cumbersome. Also, at the same time, when the true parameters are getting closer to the moment restrictions, the estimation procedure becomes intractable. In other words, the parameters in the bilinear model can best be estimated when they are small. However, when the parameters are small, the fit and forecast gain of the bilinear model versus for example a linear AR(1) is negligible.
4.1 Estimation
There are many studies on the estimation of the parameters of the first-order diagonal bilinear model, and various other bilinear models. Charemza, Lifshits, and Makarova (2005) study the case where α = 1. Guegan and Pham (1989) discuss the estimation of the parameters using the least squares method. A method of moments estimator for the above diagonal model is put forward by Kim, Billard, and Basawa (1990). Pham and Tran (1981) discuss various properties of the first-order bilinear time series model. Sesay and Subba Rao (1988) consider estimation methods using higher order moments, and Subba Rao (1981) provides a general theory of bilinear models. Finally, Brunner and Hess (1995) discuss the potential problems with the likelihood function of the first-order bilinear model.
The main conclusion from the literature so far is that it is not easy to properly estimate the parameters of a first-order diagonal time series model. And, these problems rapidly increase when the parameters are closer to the boundaries of the moment restrictions, see Brunner and Hess (1995).
A simple method to estimate the parameters in the first-order diagonal model is based on a two-step method. Recall that the autocorrelations of the first-order diagonal model mimic that of an ARMA(1,1) model where the parameter α is the same. The idea is now to estimate α using Maximum Likelihood from
and in a second step β using Ordinary Least Squares (OLS) from
When this method is applied to the changes in unemployment data, we obtain
with standard errors in parentheses. Another method is to estimate α and β from a shortened version of (15), that is
Next, use Nonlinear Least Squares (NLS) and use the obtained
To see if the first estimation method is dependable, we generate one thousand series of length 200 from a first-order diagonal model with true parameters α = 0.6 and β = 0.2. The averages of the one thousand estimated parameters are
Simulation results on two OLS based estimation methods, based on one thousand replications. Sample size is 200.
α | β | Mean
|
Mean
|
Mean
|
Median
|
---|---|---|---|---|---|
ARMA(1,1) based | |||||
0.8 | 0.2 | 0.77 | 0.13 | 1.11 | 1.08 |
0.6 | 0.2 | 0.57 | 0.16 | 1.03 | 1.02 |
0.4 | 0.2 | 0.37 | 0.18 | 1.03 | 1.02 |
Regression based using NLS | |||||
0.8 | 0.2 | 0.79 | 0.14 | 1.08 | 1.06 |
0.6 | 0.2 | 0.58 | 0.18 | 1.00 | 0.99 |
0.4 | 0.2 | 0.39 | 0.19 | 0.98 | 0.98 |
Potential problems with estimating the parameters of a first-order diagonal bilinear model can be caused by the fact that the nonlinear features of the data appear through only a few data points. This is visualized in Figure 10, where we present the influence statistics from the regression

Influence statistics from the regression

Influence statistics for Influence statistics from the regression
4.2 Fit and Forecasting
There are only a few studies where bilinear models are considered for forecasting. Examples are Poskitt and Tremayne (1986) and Weiss (1986), and there it is found for a few cases that bilinear models can slightly improve on linear models.
It seems that if the parameters are further away from the moment violation cases, ignoring the bilinear part does not matter much for forecasting. The simulation results in Table 2 support this notion. We generate one thousand time series of length 200 for a first-order diagonal bilinear model, with parameter values obeying the moment restrictions visualized in Figures 7 to 9. Each time, we choose for parameter configurations that are far away from the boundaries. We estimate either an AR(1) or an ARMA(1,1) model, and we report the mean and median values of the
In-sample static one-step-ahead forecast error variance when fitting an AR(1) or an ARMA(1,1) model to 200 observations generated from y
t
= αy
t−1 + βɛ
t−1
y
t−1 + ɛ
t
with
α | β | AR(1) | ARMA(1,1) | ||
---|---|---|---|---|---|
Mean
|
Median
|
Mean
|
Median
|
||
0.8 | 0.2 | 1.22 | 1.19 | 1.18 | 1.19 |
0.6 | 0.2 | 1.11 | 1.09 | 1.10 | 1.08 |
0.4 | 0.2 | 1.08 | 1.08 | 1.08 | 1.07 |
0.6 | −0.2 | 1.11 | 1.10 | 1.10 | 1.09 |
From Table 2 we see that for various parameter configurations, whether it is an AR(1) or an ARMA(1,1) model, the differences between in-sample prediction errors are around 1.10. Hence, even without estimating the parameters in the bilinear model, the potential forecast gain of a bilinear model is something like 10 % at max. This becomes smaller when the parameters are estimated, see Table 1.
Turning back to the illustration involving the changes in unemployment, we see that the in-sample Root Mean Squared Error (RMSE) and the Mean Absolute Error (MAE) for the bilinear model are 0.296 and 0.212, respectively. For an AR(1) model fitted to the same data we obtain an RMSE of 0.299 and a MAE of 0.216. Hence, the differences are negligible.
Finally, we zoom in on a few specific observations in Table 3, recommended by van Dijk and Franses (2003), to see if the bilinear model can indeed better forecast “outliers”. Looking at various recession periods, we see that the differences between the in-sample forecasts between the bilinear model and the linear AR(1) model are again negligible.
Forecasts for specific periods for changes in unemployment rate.
Quarter | True | Bilinear | AR(1) | ||
---|---|---|---|---|---|
Forecast | Error | Forecast | Error | ||
1953Q4 | 1.0 | 0.06 | 0.94 | 0.07 | 0.93 |
1954Q1 | 1.6 | 0.71 | 0.89 | 0.63 | 0.97 |
1954Q2 | 0.5 | 1.12 | −0.62 | 1.01 | −0.51 |
1957Q4 | 0.7 | 0.05 | 0.65 | 0.07 | 0.63 |
1957Q1 | 1.4 | 0.46 | 0.94 | 0.44 | 0.96 |
1958Q1 | 1.1 | 0.99 | 0.11 | 0.88 | 0.22 |
1974Q3 | 0.4 | 0.05 | 0.35 | 0.07 | 0.33 |
1974Q4 | 1.0 | 0.24 | 0.76 | 0.26 | 0.74 |
1975Q1 | 1.7 | 0.67 | 1.03 | 0.63 | 1.07 |
1980Q1 | 0.3 | 0.05 | 0.25 | 0.07 | 0.23 |
1980Q2 | 1.0 | 0.17 | 0.83 | 0.19 | 0.81 |
1980Q3 | 0.4 | 0.69 | −0.29 | 0.63 | −0.23 |
1981Q4 | 0.8 | 0.00 | 0.80 | 0.00 | 0.80 |
1982Q1 | 0.6 | 0.55 | 0.05 | 0.51 | 0.09 |
1982Q2 | 0.6 | 0.33 | 0.27 | 0.38 | 0.22 |
1982Q3 | 0.5 | 0.35 | 0.15 | 0.38 | 0.12 |
1982Q4 | 0.8 | 0.28 | 0.52 | 0.32 | 0.48 |
2008Q3 | 0.7 | 0.17 | 0.53 | 0.19 | 0.51 |
2008Q4 | 0.9 | 0.44 | 0.46 | 0.44 | 0.46 |
2009Q1 | 1.4 | 0.56 | 0.84 | 0.57 | 0.83 |
2009Q2 | 1.0 | 0.97 | 0.03 | 0.88 | 0.12 |
5 Conclusions
In this paper we provided a detailed analysis of the first-order diagonal bilinear time series model. This model has remarkable features, one of which is that it allows for a sequence of “outliers”, and hence may predict the value of the next “outlier”. Despite its beauty, we showed that this bilinear model will unlikely be successful for in-sample fit and for out-of-sample prediction of economic time series. This is because the bilinear features show up in just a small number of observations. In turn, this makes parameter estimation difficult, and sometimes even impossible. We also saw that when moment restrictions are far from being violated, that then approximative linear models have equivalent forecast errors. In sum, when the parameters fit well to the moment restrictions, the parameters can simply be estimated, but then we saw that linear approximative models perform about just as good.
In short, in cases when we can properly draw inference from a bilinear model, it is barely relevant for more accurate forecasting, not in the full sample, nor in specific “outlier-like” cases. It seems thus fair to conclude that the bilinear model is beautiful, but not especially useful for forecasting. So, its creator Clive Granger is right.
References
Basrak, B., R. A. David, and T. Mikosch. 1999. “The Sample ACF of a Simple Bilinear Process.” Stochastic Processes and their Applications 83 (1): 1–14. https://doi.org/10.1016/s0304-4149(99)00013-7.Search in Google Scholar
Brunner, A. D., and G. D. Hess. 1995. “Potential Problems in Estimating Bilinear Time-Series Models.” Journal of Economic Dynamics and Control 19 (4): 663–81. https://doi.org/10.1016/0165-1889(94)00798-m.Search in Google Scholar
Charemza, W. W., M. Lifshits, and S. Makarova. 2005. “Conditional Testing for Unit-Root Bilinearity in Financial Time Series: Some Theoretical and Empirical Results.” Journal of Economic Dynamics and Control 29 (1): 63–96. https://doi.org/10.1016/j.jedc.2003.07.001.Search in Google Scholar
Granger, C. W. J., and A. P. Andersen. 1978. An Introduction to Bilinear Time Series Models. Göttingen: Vandenhoeck & Ruprecht.Search in Google Scholar
Guegan, D., and D. T. Pham. 1989. “A Note on the Estimation of the Parameters of the Diagonal Bilinear Model by the Method of Least Squares.” Scandinavian Journal of Statistics 16 (2): 129–36.Search in Google Scholar
Kim, W. K., L. Billard, and I. V. Basawa. 1990. “Estimation for the First Order Diagonal Bilinear Time Series Model.” Journal of Time Series Analysis 11 (3): 215–27. https://doi.org/10.1111/j.1467-9892.1990.tb00053.x.Search in Google Scholar
Ling, S., L. Peng, and F. Zhu. 2015. “Inference for a Special Bilinear Time Series Model.” Journal of Time Series Analysis 36 (1): 61–6. https://doi.org/10.1111/jtsa.12092.Search in Google Scholar
Pham, D. T., and L. T. Tran. 1981. “On the First Order Bilinear Time Series Model.” Journal of Applied Probability 18 (3): 617–27. https://doi.org/10.1017/s0021900200098417.Search in Google Scholar
Poskitt, D. S., and A. R. Tremayne. 1986. “The Selection and Use of Linear and Bilinear Time Series Models.” International Journal of Forecasting 2 (1): 101–14. https://doi.org/10.1016/0169-2070(86)90033-6.Search in Google Scholar
Sesay, S., and T. Subba Rao. 1988. “Yule-Walker Type Difference Equations for Higher Order Moments and Cumulants for Bilinear Time Series Models.” Journal of Time Series Analysis 9 (4): 385–401. https://doi.org/10.1111/j.1467-9892.1988.tb00478.x.Search in Google Scholar
Subba Rao, T. 1981. “On the Theory of Bilinear Models.” Journal of the Royal Statistical Society B 43 (2): 244–55. https://doi.org/10.1111/j.2517-6161.1981.tb01177.x.Search in Google Scholar
Turkman, K. F., and M. A. A. Turkman. 1997. “Extremes of Bilinear Time Series Models.” Journal of Time Series Analysis 18 (3): 305–19. https://doi.org/10.1111/1467-9892.00051.Search in Google Scholar
Van Dijk, D., and P. H. Franses. 2003. “Selecting a Nonlinear Time Series Model Using Weighted Tests of Equal Forecast Accuracy.” Oxford Bulletin of Economics & Statistics 65 (S1): 727–44. https://doi.org/10.1046/j.0305-9049.2003.00091.x.Search in Google Scholar
Weiss, A. A. 1986. “ARCH and Bilinear Time Series Models: Comparison and Combination.” Journal of Business & Economic Statistics 4 (1): 59–70. https://doi.org/10.2307/1391387.Search in Google Scholar
© 2025 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Frontmatter
- Research Articles
- The Story of a Model: The First-Order Diagonal Bilinear Autoregression
- Maximum Likelihood Estimation of Regression Effects in State Space Models
- Software
- QR.break: An R Package for Structural Breaks in Quantile Regression
- Practitioner's Corner
- Fast Algorithms for Quantile Regression with Selection
Articles in the same Issue
- Frontmatter
- Research Articles
- The Story of a Model: The First-Order Diagonal Bilinear Autoregression
- Maximum Likelihood Estimation of Regression Effects in State Space Models
- Software
- QR.break: An R Package for Structural Breaks in Quantile Regression
- Practitioner's Corner
- Fast Algorithms for Quantile Regression with Selection