Abstract
Explicit formulae for maximum likelihood estimates of the parameters of square root processes and Bessel processes and first and second order approximate sufficient statistics are supplied. Applications of the estimation formulae to simulated interest rate and index time series are supplied, demonstrating the accuracy of the approximations and the extreme speed-up in estimation time. This significantly improved run time for parameter estimation has many applications where ex-ante forecasts are required frequently and immediately, such as in hedging interest rate, index and volatility derivatives based on such models, as well as modelling credit risk, mortality rates, population size and voting behaviour.
1 Introduction
Fast and accurate statistical methods in finance and other areas of applied probability are becoming increasingly important due to the enormous amounts of data available that have to be analysed. We have extremely fast and accurate methods for modelling Gaussian dynamics, where in a continuous-time setting linearly transformed Brownian motions drive model dynamics, for example, as for the logarithm of an asset price under the Black-Scholes model (see Black and Scholes (1973)) or as for the short term interest rate under the Vasicek model. This seems to be not the case for next-generation models that are based on Bessel processes and square root processes like the CIR model in finance. In population size modelling (see e.g. Feller (1971)) the square root process or the squared Bessel process emerges naturally as a suitable model. Similarly, the voting behaviour seems to be well modelled, similar to population size, by a squared Bessel process; see Karlin and Taylor (1981). Also, in finance the minimal market model (see Platen (1997)) emerges naturally as a suitable model. Interest has evolved also for the 3/2 stochastic volatility model (see Platen (1997) and Heston (1997)) which appears to be an excellent volatility model; see Goard and Mazur (2013). Furthermore, square root processes have many relevant applications in credit risk modelling (see Bielecki, Jeanblanc, and Rutkowski (2011) for an application to intensity-based models) and in stochastic mortality modelling (see Biffis (2005) and Blackburn and Sherris (2013)).
Various methods have been proposed to estimate the parameters of the model which fit a given time series of observed short rates. For example, in Poulsen (1999) a lognormal approximation is employed. In Fergusson and Platen (2015) the parameters of the CIR short rate model were estimated using the maximum likelihood method using the Newton-Raphson iterative method applied to both the log-likelihood function and an approximate log-likelihood function based on the lognormal distribution. In Fergusson and Platen (2015) the parameters of the 3/2 short rate model were also estimated using a gradient ascent method. The asymptotic properties of the Heston model are studied in Barczy, Pap, and Szabo (2016), which involves modelling the volatility process with the square root model in (3). Their estimates, based on discrete-time observations, are derived from conditional least squares estimates of modified parameters, assuming that the diffusion parameter σ is known. On the other hand, in Dokuchaev (2017) estimators of σ are supplied that do not require knowledge of the drift parameters. In Overbeck and Rydén (1997), estimators of parameters of (3), employing observations at equispaced times and based on conditional least squares, are studied. Another approach involving quasi-maximum likelihood estimation via the Wagner-Platen approximation is introduced in Huang (2013). However, in the current paper we supply closed-form approximate formulae for the maximum likelihood estimates of the parameters of the square root process and, consequently, transformations of this, including squared Bessel processes and Bessel processes.
In Section 2 we describe the probabilistic framework for our stochastic processes. In Section 3 we state theorems for closed-form approximate formulae for the parameters
2 Probabilistic framework
A Bessel process is specified by the stochastic differential equation (SDE)
where
Letting
where
A common application of the square root process to finance, is modelling the short rate, which is the continuously compounded rate of interest paid on a cash deposit for an infinitesimally small unit of time. In reality, the short rate is represented by short-term rates such as the overnight cash rate, which is targeted by central banks, or money market rates having terms up to 12 months. The Cox-Ingersoll-Ross (CIR) model of the short rate, given in Cox Ingersoll, and Ross (1985), is specified by the SDE
where
A zero-valued short rate is avoided because we set
Related to the CIR short rate model is the 3/2 short rate model, derived in Platen (1997, 1999 and subsequently studied in Ahn and Gao (1999),
where
Both the CIR model and 3/2 model have the property that for the above conditions on the parameters the interest rates can never be negative.
The transition density function of the short rate process in (3) is that of the non-central chi-square distribution, which, for times s and t with
where
is the power series expansion of the modified Bessel function of the first kind.
Restated, for
given
Let
are all positive. The log-likelihood function which we seek to maximise with respect to the parameters
where the transition density function
3 Main theorem and corollary
Let us first state and prove the main theorem of this paper, which concerns parameter estimates of the square root process based on a second order approximation to the log-likelihood function.
Before doing this, we motivate our definition of an approximately sufficient statistic by recalling the following definition of a sufficient statistic.
Definition 1:
In respect of sampled data
Remark 1:
We see that a sufficient statistic T constitutes a sufficient reduction of the data for the purpose of calculating the maximum likelihood estimates. Indeed, this is a consequence of the Fisher-Neyman factorisation criterion which characterises a sufficient statistic for a probability distribution having density
In our formulation of the main theorem we require a definition of an approximate sufficient statistic, where we are interested in the degree to which a stochastic model should be modified to make the statistic sufficient; see McCullagh (1984) for a discussion of local sufficiency or Chapter 5 of Le Cam (1986) for a discussion of approximate sufficiency, for example.
Definition 2:
In respect of equispaced sampled data
Theorem 1:
Given the observations of the square root process in (10)
, the maximum likelihood estimates of the parameters
respectively, subject to the condition
where
and where we have the second order approximate sufficient statistics L,
Also, the Fisher information matrix is approximately given by
where
and
Finally, we have that as
Proof:
See Appendix A for a proof.
The following theorem supplies parameter estimates of the square root process using a first order approximation to the log-likelihood function.
Theorem 2:
Given the observations of the short rate in (10), the maximum likelihood estimates of the parameters
respectively, subject to the condition
where
and where we have the first order approximate sufficient statistics L,
Also, the Fisher information matrix is given approximately in (22). Finally, asymptotic normality, as
Proof:
See Appendix A for a proof.
The following corollary to Theorem 1 is straightforwardly established.
Corollary 1:
Given the observations of the short rate in (10) , the maximum likelihood estimates of the parameters p, q and σ of the 3/2 short rate model given in (5) are
where formulae for
4 Applications
In this section we supply various aspects of the estimation methods. Firstly, we check the ability of the estimated parameters from simulated data to recover the actual parameter values, for varying lengths of the series. Secondly, we fit the models to US interest rate data. Thirdly, we compare the run times with the Newton-Raphson method.
4.1 Estimation from simulated data
Here we simulate data sets from each of the processes, squared-Bessel,
4.1.1 CIR model
Commencing with
Parameter estimates of the CIR, 3/2 and Bessel process short rate models fitted to monthly US short rates via several methods.
Model | Method | Log-likelihood | Parameter estimates |
---|---|---|---|
Vasicek | Exact formula | 7109.33145040 |
|
|
|||
|
|||
CIR | Newton-Raphson | 7559.08057481 |
|
|
|||
|
|||
|
|||
CIR | 1st order approx | 7559.08042386 |
|
|
|||
|
|||
|
|||
CIR | 2nd order approx | 7559.08057479 |
|
|
|||
|
|||
|
|||
3/2 | Newton-Raphson | 7213.56087570 |
|
|
|||
|
|||
|
|||
3/2 | 1st order approx | 7213.56078067 |
|
|
|||
|
|||
|
|||
3/2 | 2nd order approx | 7213.56087569 |
|
|
|||
|
|||
|
|||
Bessel | Newton-Raphson | 7168.45470719 |
|
|
|||
|
|||
|
|||
Bessel | 1st order approx | 7162.69765360 |
|
|
|||
|
|||
|
|||
Bessel | 2nd order approx | 7164.59619582 |
|
|
|||
|
|||
|

In respect of the CIR model, a graph of the evolution of the estimate of

In respect of the CIR model, a graph of the evolution of the estimate of κ from 1/4 to 20 years of simulated daily data.

In respect of the CIR model, a graph of the evolution of the estimate of σ from 1/4 to 20 years of simulated daily data.

In respect of the CIR model, a graph of the evolution of the estimate of

In respect of the CIR model, graphs of the evolutions of the estimates of
In Figure 1 we show the effect of
In Figure 2 we show the effect of
In Figure 3 we show the effect of
In Figure 4 we show the effect of
In conclusion, σ and
4.1.2 3/2 model
Commencing with

In respect of the

In respect of the 3/2 model, a graph of the evolution of the estimate of q from 1/4 to 20 years of simulated daily data.

In respect of the 3/2 model, a graph of the evolution of the estimate of σ from 1/4 to 20 years of simulated daily data.

In respect of the 3/2 model, a graph of the evolution of the estimate of

In respect of the 3/2 model, graphs of the evolutions of the estimates of p, q, σ and
In Figure 6 we show the effect of
In Figure 7 we show the effect of
In Figure 8 we show the effect of
In Figure 9 we show the effect of
In conclusion, σ is the most accurately estimated parameter over all window lengths up to 150 years, followed by p and lastly q. With daily sampling of the process, good estimates of σ can be obtained in about half a year and with weekly sampling, good estimates are obtainable in about 2 years. For windows longer than 10 years, quarterly, monthly, weekly and daily sampling give rise to estimates of similar accuracy. With such accurate estimates of σ, we see from the relation
4.1.3 Bessel model
Commencing with

In respect of the Bessel model, a graph of the evolution of the estimate of α from 1/4 to 20 years of simulated daily data.

In respect of the Bessel model, a graph of the evolution of the estimate of β from 1/4 to 20 years of simulated daily data.

In respect of the Bessel model, a graph of the evolution of the estimate of γ from 1/4 to 20 years of simulated daily data.

In respect of the Bessel model, a graph of the evolution of the estimate of

In respect of the Bessel model, graphs of the evolutions of the estimates of α, β, γ and ν from 1/4 to 150 years of simulated daily data. The dashed lines surrounding each of the evolutions of estimates provide a 95% confidence interval for the parameter value. The dashed horizontal red line shows the true value of the parameter.
In Figure 11 we show the effect of
In Figure 12 we show the effect of
In Figure 13 we show the effect of
In Figure 14 we show the effect of
In conclusion, γ is the most accurately estimated parameter over all windows, particularly for
4.2 Estimation from historical interest rates
In this section we demonstrate the parameter estimation for each of the processes when used to model interest rates. For the set of monthly US short rates over the period 1871–2019, as given by the one-year rates in Shiller (2019) and supplemented by six-month LIBOR rates in Federal Reserve Bank of St. Louis (2019), we obtain the maximum likelihood estimates in Table 1. The Newton-Raphson approximation algorithm was used to compute the exact parameter estimates for each of the models, whereas the estimates derived using the first and second term approximations employ the formulae given in Theorem 1 and Corollary 1. The parameter estimates of the Vasicek model were used in determining an initial set of parameter estimates for the Newton-Raphson method applied to the CIR model, as described in Fergusson and Platen (2015).
We can draw several conclusions from Table 1, where the data series is monthly and nearly 150 years long. Firstly, simulations performed in Section 4.1.1 for the CIR model, having the same parameters used in Table 1, indicate that the minimum window length for reasonable estimates of
To illustrate the estimation methods for more frequent observations, for the set of daily US Effective Federal Funds rates over the period July 1954 to January 2020, as given in Federal Reserve Bank of St. Louis (2020), we obtain the maximum likelihood estimates in Table 2. We note that the second order approximation in respect of the Bessel model fails due to Condition A not being met.
Parameter estimates of the CIR, 3/2 and Bessel process short rate models fitted to daily Effective Federal Funds rates via several methods.
Model | Method | Log-likelihood | Parameter estimates |
---|---|---|---|
Vasicek | Exact formula | 102991.65 |
|
|
|||
|
|||
CIR | Newton-Raphson | 108524.04 |
|
|
|||
|
|||
|
|||
CIR | 1st order approx | 108524.03 |
|
|
|||
|
|||
|
|||
CIR | 2nd order approx | 108524.04 |
|
|
|||
|
|||
|
|||
3/2 | Newton-Raphson | 81391.08 |
|
|
|||
|
|||
|
|||
3/2 | 1st order approx | 81391.06 |
|
|
|||
|
|||
|
|||
3/2 | 2nd order approx | 81391.07 |
|
|
|||
|
|||
|
|||
Bessel | Newton-Raphson | 127474.83 |
|
|
|||
|
|||
|
|||
Bessel | 1st order approx | 127474.83 |
|
|
|||
|
|||
|
|||
Bessel | 2nd order approx | Undefined (Condition A is not met) |
|
|
|||
|
|||
|
We can draw several conclusions from Table 2, where the data series is daily and roughly 65 1/2 years long. Firstly, simulations performed in Section 4.1.1 for the CIR model, despite being based on different parameters, indicate that the respective minimum window lengths for reasonable estimates of
4.3 Comparison of run times
To give an idea of the substantial improvement in run times of the first order and second order approximation methods over the standard Newton-Raphson estimation method, we simulated in MATLAB 1000 data sets of CIR short rates, based on the CIR parameter values
Average run time of CIR estimation method per simulated data set using CIR parameter estimates and 1000 simulations.
Method | Average run time | Average parameter estimates |
---|---|---|
Newton-Raphson | 1.162925 |
|
|
||
|
||
|
||
1st order approx | 0.000157 |
|
|
||
|
||
|
||
2nd order approx | 0.000244 |
|
|
||
|
||
|
5 Conclusion
The closed-form formulae presented in this paper permit accurate parameter estimates of the CIR and 3/2 short rate model to be performed in spreadsheets and other high-level software packages, such as MATLAB and R, and permit exceedingly fast computations in computer algorithms. This significantly improved run time for parameter estimation has many applications where ex-ante forecasts are required frequently and immediately, such as in hedging interest rate derivatives and volatility based on such models.
A Proofs of theorems 1 and 2
In this technical appendix we supply proofs of Theorems 1 and 2. Our proof of Theorem 1 is as follows.
Proof:
For convenience, we make the change of variables given in (23) and we rewrite the log-likelihood function in (11) in terms of these new variables as
where L,
Our proof of the theorem involves solving for the new variables using an approximation to the first order derivatives of the log-likelihood function in (32).
The modified Bessel function of the first kind given in (7) can also be written as
from which we can derive the asymptotic expansion, given in
as
and
where the subscript ν in the big-O notation indicates dependence of the implied constant upon ν. Taking logarithms of both sides gives
In terms of our new variables given in (23) we rewrite the log-likelihood function in (32) as
where
and where L,
Our plan is firstly to show that the solution
is close to the solution
and secondly to solve (40).
Write
for some
Therefore,
We know that
and that
It is apparent from (44) and (45) that h is close to zero, that is,
To solve (40), we rewrite
where
and where, for ease of legibility in our algebra, the functions f, g and h are defined as
Equality in (48) is achieved when
We now focus on determining the optimum parameter values of the new log-likelihood function
The first order partial derivatives of
The maximum likelihood estimates are found by equating these first order partial derivatives to zero and solving for a and k.
Thus, from (52) we have the two equations
Rearranging (54) gives the quadratic equation in a
In the light of (54), we can reduce the degree of the polynomial in a in (53) as follows
This is a cubic equation in a that can be rearranged as
Euclid’s greatest common divisor algorithm applied to the polynomial equations (55) and (57) gives a monic polynomial in a. The first step of this algorithm is (57) minus
The second step of this algorithm is (58) minus
Letting
we have from (59) the solution for a
Note that replacing a in (55) by the RHS expression in (61) gives
which only involves the variable k. Defining the function
and making the assumption that k is close to zero, an approximate solution can be found by solving
where
Therefore, from (64) we have
where the choice of sign of the square root is the same as that of
The original parameters are estimated as
The second order partial derivatives are
and the Fisher information matrix with respect to the variables a, k and v is simply
From (23),
for small values of
Finally, asymptotic normality in (26) of the maximum likelihood estimates follows from (12) and asymptotic theory of MLEs; see for example Theorem 3.10 in Lehmann and Casella (1998).
Our proof of Theorem 2 is as follows.
Proof:
We continue in a similar vein as in the proof of Theorem 1 up to the equations formed by equating the first order partial derivatives in (52) to zero and solving for a and k, where instead the log-likelihood function in (32) is rewritten as
where
with equality achieved when
and where the first order partial derivatives of
Thus, setting (75) to zero, we have the two equations
Rearranging (77) gives the equation for a
Eliminating a from (76) gives
which only involves the variable k and where we have defined the function
where
We remark that the expressions for
Therefore, from (80) we have
where the choice of sign of the square root is the same as that of
The original parameters are estimated as
The second order partial derivatives are
and the Fisher information matrix with respect to the variables a, k and v is simply
The Fisher information matrix
B Calculations of parameter standard errors in Table 1
We supply here the full details of the calculations of the standard errors of the parameters for the CIR model, given in Table 1. The calculations of the standard errors of the parameters for the 3/2 and Bessel models are calculated analogously. Once the maximum likelihood estimates of the parameters of the CIR model have been obtained, namely
we compute numerically the Fisher Information Matrix, approximated by the observed Fisher Information Matrix,
With
and values of the log-likelihood function at points neighbouring the maximum likelihood estimate vector
and
we compute numerically the second order partial derivatives as
so that the Fisher Information Matrix becomes
The covariance matrix
and the standard errors emerge as square roots of the diagonal elements of this matrix
which are very close to those given for the CIR model in Table 1.
Funding source: Australian Government Research Training Program Scholarship
-
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
-
Research funding: This research is supported by an Australian Government Research Training Program Scholarship.
-
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.
References
Ahn, D., and B. Gao. 1999. “A Parametric Nonlinear Model of Term Structure Dynamics.” Review of Financial Studies 12: 721–62, https://doi.org/10.1093/rfs/12.4.721.Suche in Google Scholar
Barczy, M., G. Pap, and T. T. Szabo. 2016. “Parameter Estimation for the Subcritical Heston Model Based on Discrete Time Observations.” Acta Scientiarum Mathematicarum (Szeged) 82: 313–38, https://doi.org/10.14232/actasm-015-016-0.Suche in Google Scholar
Bielecki, T., M. Jeanblanc, and M. Rutkowski. 2011. “Hedging of a Credit Default Swaption in the CIR Default Intensity Model.” Finance and Stochastics 15: 541–72, https://doi.org/10.1007/s00780-010-0143-7.Suche in Google Scholar
Biffis, E. 2005. “Affine Processes for Dynamic Mortality and Actuarial Valuations.” Insurance: Mathematics and Economics 37: 443–68, https://doi.org/10.1016/j.insmatheco.2005.05.003.Suche in Google Scholar
Black, F., and M. Scholes. 1973. “The Pricing of Options and Corporate Liabilities.” Journal of Political Economy 81: 637–54, https://doi.org/10.1086/260062.Suche in Google Scholar
Blackburn, F., and M. Sherris. 2013. “Consistent Dynamic Affine Mortality Models for Longevity Risk Applications.” Insurance: Mathematics and Economics 53: 64–73, https://doi.org/10.1016/j.insmatheco.2013.04.007.Suche in Google Scholar
Cox, J. C., J. E. Ingersoll, and S. A. Ross. 1985. “A Theory of the Term Structure of Interest Rates.” Econometrica 53: 385–407, https://doi.org/10.2307/1911242.Suche in Google Scholar
Dokuchaev, N. 2017. “A Pathwise Inference Method for the Parameters of Diffusion Terms.” Journal of Nonparametric Statistics 29: 731–43, https://doi.org/10.1080/10485252.2017.1367789.Suche in Google Scholar
Elliott, R., and E. Platen. 2001. Hidden Markov Chain Filtering for Generalised Bessel Processes, Trends in Mathematics, 123–43. Boston, MA: Birkhäuser. chapter 7.10.1007/978-1-4612-0167-0_7Suche in Google Scholar
Federal Reserve Bank of St. Louis. 2019. 6-Month London Interbank Offered Rate (LIBOR), Based on U.S. Dollar (USD6MTD156N), https://fred.stlouisfed.org/series/USD6MTD156N, [Online; accessed April 16 2019].Suche in Google Scholar
Federal Reserve Bank of St. Louis. 2020. Effective Federal Funds Rate [DFF]. https://fred.stlouisfed.org/series/DFF, [Online; accessed February 3 2020].Suche in Google Scholar
Feller, W. 1971. An Introduction to Probability Theory and its Applications, Vol 2, 2nd ed. New York: Wiley.Suche in Google Scholar
Fergusson, K., and E. Platen. 2015. “Application of Maximum Likelihood Estimation to Stochastic Short Rate Models.” Annals of Financial Economics 10: 1–26, https://doi.org/10.1142/s2010495215500098.Suche in Google Scholar
Goard, J., and M. Mazur. 2013. “Stochastic Volatility Models and the Pricing of VIX Options.” Mathematical Finance 23: 439–58, https://doi.org/10.1111/j.1467-9965.2011.00506.x.Suche in Google Scholar
Heston, S. L. 1997. A Simple New Formula for Options with Stochastic Volatility, Technical report: Washington University of St. Louis.Suche in Google Scholar
Huang, X. 2013. “Quasi-maximum Likelihood Estimation of Multivariate Diffusions.” Studies in Nonlinear Dynamics and Econometrics 17: 179–97, https://doi.org/10.1515/snde-2012-0026.Suche in Google Scholar
Karlin, S., and H. M. Taylor. 1981. A Second Course in Stochastic Processes. New York: Academic Press.Suche in Google Scholar
Le Cam, L. 1986. Asymptotic Methods in Statistical Decision Theory. New York: Springer-Verlag.10.1007/978-1-4612-4946-7Suche in Google Scholar
Lehmann, E. L., and G. Casella. 1998. Theory of Point Estimation. New York: Springer.Suche in Google Scholar
McCullagh, P. 1984. “Local Sufficiency.” Biometrika 71: 233–44, https://doi.org/10.1093/biomet/71.2.233.Suche in Google Scholar
Overbeck, L., and T. Rydén. 1997. “Estimation in the Cox-Ingersoll-Ross Model.” Econometric Theory 13: 430–61, https://doi.org/10.1017/s0266466600005880.Suche in Google Scholar
Platen, E. 1997. A Non-linear Stochastic Volatility Model, Working Paper: Australian National University, Financial Mathematics Research Reports. FMRR 005-97.Suche in Google Scholar
Platen, E. 1999. “A Short-Term Interest Rate Model.” Finance and Stochastics 3: 215–25, https://doi.org/10.1007/s007800050059.Suche in Google Scholar
Poulsen, R. 1999. Approximate Maximum Likelihood Estimation of Discretely Observed Diffusion Processes, Working Paper No. 29: University of Aarhus.Suche in Google Scholar
Shiller, R. 2019. One-Year Interest Rate, http://www.econ.yale.edu/ shiller/data.htm, [Online; accessed April 16 2019].Suche in Google Scholar
Watson, G. N. 1966. A Treatise on the Theory of Bessel Functions, 2nd ed.: Cambridge University Press.Suche in Google Scholar
Supplementary Material
The online version of this article offers supplementary material (https://doi.org/10.1515/snde-2019-0079).
© 2020 Kevin Fergusson, published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Forecasting Japanese inflation with a news-based leading indicator of economic activities
- Air pollution, mortality, at-risk population, new entry and life expectancy of the frail elderly in three U.S. cities
- Fast maximum likelihood estimation of parameters for square root and Bessel processes
- A monitoring procedure for detecting structural breaks in factor copula models
- Construction of leading economic index for recession prediction using vine copulas
- Financial integration in emerging economies: an application of threshold cointegration
- When is discretionary fiscal policy effective?
Artikel in diesem Heft
- Frontmatter
- Research Articles
- Forecasting Japanese inflation with a news-based leading indicator of economic activities
- Air pollution, mortality, at-risk population, new entry and life expectancy of the frail elderly in three U.S. cities
- Fast maximum likelihood estimation of parameters for square root and Bessel processes
- A monitoring procedure for detecting structural breaks in factor copula models
- Construction of leading economic index for recession prediction using vine copulas
- Financial integration in emerging economies: an application of threshold cointegration
- When is discretionary fiscal policy effective?