Fast maximum likelihood estimation of parameters for square root and Bessel processes

Kevin Fergusson

doi:10.1515/snde-2019-0079

Artikel Open Access

Fast maximum likelihood estimation of parameters for square root and Bessel processes

Kevin Fergusson

Veröffentlicht/Copyright: 12. August 2020

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Studies in Nonlinear Dynamics & Econometrics Band 25 Heft 4

Abstract

Explicit formulae for maximum likelihood estimates of the parameters of square root processes and Bessel processes and first and second order approximate sufficient statistics are supplied. Applications of the estimation formulae to simulated interest rate and index time series are supplied, demonstrating the accuracy of the approximations and the extreme speed-up in estimation time. This significantly improved run time for parameter estimation has many applications where ex-ante forecasts are required frequently and immediately, such as in hedging interest rate, index and volatility derivatives based on such models, as well as modelling credit risk, mortality rates, population size and voting behaviour.

Keywords: Bessel process; Cox-Ingersoll-Ross model; maximum likelihood estimation; squared Bessel process

1 Introduction

Fast and accurate statistical methods in finance and other areas of applied probability are becoming increasingly important due to the enormous amounts of data available that have to be analysed. We have extremely fast and accurate methods for modelling Gaussian dynamics, where in a continuous-time setting linearly transformed Brownian motions drive model dynamics, for example, as for the logarithm of an asset price under the Black-Scholes model (see Black and Scholes (1973)) or as for the short term interest rate under the Vasicek model. This seems to be not the case for next-generation models that are based on Bessel processes and square root processes like the CIR model in finance. In population size modelling (see e.g. Feller (1971)) the square root process or the squared Bessel process emerges naturally as a suitable model. Similarly, the voting behaviour seems to be well modelled, similar to population size, by a squared Bessel process; see Karlin and Taylor (1981). Also, in finance the minimal market model (see Platen (1997)) emerges naturally as a suitable model. Interest has evolved also for the 3/2 stochastic volatility model (see Platen (1997) and Heston (1997)) which appears to be an excellent volatility model; see Goard and Mazur (2013). Furthermore, square root processes have many relevant applications in credit risk modelling (see Bielecki, Jeanblanc, and Rutkowski (2011) for an application to intensity-based models) and in stochastic mortality modelling (see Biffis (2005) and Blackburn and Sherris (2013)).

Various methods have been proposed to estimate the parameters of the model which fit a given time series of observed short rates. For example, in Poulsen (1999) a lognormal approximation is employed. In Fergusson and Platen (2015) the parameters of the CIR short rate model were estimated using the maximum likelihood method using the Newton-Raphson iterative method applied to both the log-likelihood function and an approximate log-likelihood function based on the lognormal distribution. In Fergusson and Platen (2015) the parameters of the 3/2 short rate model were also estimated using a gradient ascent method. The asymptotic properties of the Heston model are studied in Barczy, Pap, and Szabo (2016), which involves modelling the volatility process with the square root model in (3). Their estimates, based on discrete-time observations, are derived from conditional least squares estimates of modified parameters, assuming that the diffusion parameter σ is known. On the other hand, in Dokuchaev (2017) estimators of σ are supplied that do not require knowledge of the drift parameters. In Overbeck and Rydén (1997), estimators of parameters of (3), employing observations at equispaced times and based on conditional least squares, are studied. Another approach involving quasi-maximum likelihood estimation via the Wagner-Platen approximation is introduced in Huang (2013). However, in the current paper we supply closed-form approximate formulae for the maximum likelihood estimates of the parameters of the square root process and, consequently, transformations of this, including squared Bessel processes and Bessel processes.

In Section 2 we describe the probabilistic framework for our stochastic processes. In Section 3 we state theorems for closed-form approximate formulae for the parameters r ¯ , κ and σ of (3) based on second order and first order approximations to the log-likelihood function. A corollary is supplied which provides estimates of (5). In Section 4 we demonstrate applications of the theorem to simulated data sets of squared Bessel processes and to an actual short rate data set, exemplifying the accuracy of the formulae. Finally, in Section 5 we conclude with some comments.

2 Probabilistic framework

A Bessel process is specified by the stochastic differential equation (SDE)

(1) d R t = ( α R t + β R t ) d t + γ d Z t ,

where Z = ( Z t ) t ≥ 0 is a Wiener process adapted to a filtered probability space ( Ω , ℱ , ℱ ¯ , P ) , ℱ ¯ = ( ℱ t ) t ≥ 0 is a family of sigma algebras, R 0 = x is the initial value and α, β and γ are constants. The case when these real-valued functions are Markov chains has been considered in Elliott and Platen (2001), but we confine ourselves to the case when they are constants.

Letting r t = R t 2 and using the Itô formula, we have the SDE for the squared Bessel process, otherwise termed a square root process, as

(2) d r t = ( 2 α + γ 2 + 2 β r t ) d t + 2 γ r t d Z t ,

where r 0 = x 2 > 0 is the positive initial value.

A common application of the square root process to finance, is modelling the short rate, which is the continuously compounded rate of interest paid on a cash deposit for an infinitesimally small unit of time. In reality, the short rate is represented by short-term rates such as the overnight cash rate, which is targeted by central banks, or money market rates having terms up to 12 months. The Cox-Ingersoll-Ross (CIR) model of the short rate, given in Cox Ingersoll, and Ross (1985), is specified by the SDE

(3) d r t = κ ( r ¯ − r t ) d t + σ r t d Z t ,

where Z = ( Z t ) t ≥ 0 is as before and r ¯ , κ and σ are positive constants. It is readily seen that (3) is the particular case of a squared Bessel process in (2) with

(4) α = 1 2 r ¯ κ − 1 8 σ 2 , β = − 1 2 κ , γ = 1 2 σ .

A zero-valued short rate is avoided because we set κ r ¯ > 1 2 σ 2 > 0 . The parameter κ denotes the speed of reversion of the short rate r t to the mean reverting level r ¯ . The r ¯ can be thought of as a smoothed average short rate which is targeted by the central bank.

Related to the CIR short rate model is the 3/2 short rate model, derived in Platen (1997, 1999 and subsequently studied in Ahn and Gao (1999),

(5) d r t = ( p r t + q r t 2 ) d t + σ ( r t ) 3 / 2 d Z t ,

where p > 0 , q and σ > 0 are constants satisfying q < σ 2 / 2 to avoid explosive values of r t . The SDE for the reciprocal of the 3/2 short rate is that for the CIR short rate model and, therefore, maximum likelihood estimation of the parameters of the 3/2 model is equivalent to that for the CIR model.

Both the CIR model and 3/2 model have the property that for the above conditions on the parameters the interest rates can never be negative.

The transition density function of the short rate process in (3) is that of the non-central chi-square distribution, which, for times s and t with 0 ≤ s < t , is given by

(6) p r ( s , r s , t , r t ) = 1 2 ( φ t − φ s ) e − κ t ( r t e κ t r s e κ s ) 1 2 ( ν 2 − 1 ) × exp ( − 1 2 r s e κ s + r t e κ t φ t − φ s ) I ν 2 − 1 ( r s r t e κ ( s + t ) φ t − φ s ) ,

where φ t = φ 0 + 1 4 κ σ 2 ( e κ t − 1 ) and ν = 4 κ r ¯ σ 2 and where

(7) I ν ( x ) = ∑ i = 0 ∞ 1 i ! Γ ( i + ν + 1 ) ( x 2 ) 2 i + ν

is the power series expansion of the modified Bessel function of the first kind.

Restated, for t > s the conditional random variable

(8) e κ t φ t − φ s r t

given r s has a non-central chi-squared distribution with ν = 4 κ r ¯ / σ 2 degrees of freedom and non-centrality parameter λ = r s e κ s / ( φ t − φ s ) , namely

(9) e κ t φ t − φ s r t ∼ χ ν , r s e κ s / ( φ t − φ s ) 2 .

Let t 0 , t 1 , t 2 , … , t n be n + 1 equispaced ordered times at which the short rate is observed, where the spacing is Δ = t i − t i − 1 , for i = 1 , 2 , … , n . We assume that the observed short rates

(10) r t 0 , r t 1 , … , r t n

are all positive. The log-likelihood function which we seek to maximise with respect to the parameters r ¯ , κ and σ is given by

(11) ℓ ( r ¯ , κ , σ ) = ∑ i = 1 n log p r ( t i − 1 , r t i − 1 , t i , r t i ) ,

where the transition density function p r is as in (6).

3 Main theorem and corollary

Let us first state and prove the main theorem of this paper, which concerns parameter estimates of the square root process based on a second order approximation to the log-likelihood function.

Before doing this, we motivate our definition of an approximately sufficient statistic by recalling the following definition of a sufficient statistic.

Definition 1:

In respect of sampled data r t 0 , r t 1 , … , r t n from the stochastic process r = ( r t ) t ≥ 0 , having parameter θ ∈ ℝ d , a statistic T = T ( r t 0 , r t 1 , … , r t n ) ∈ ℝ k is said to be sufficient if knowledge of the value of T allows us to specify the likelihood function L ( θ ) determined by r t 0 , r t 1 , … , r t n , up to a constant of proportionality which does not depend on the parameter θ.

Remark 1:

We see that a sufficient statistic T constitutes a sufficient reduction of the data for the purpose of calculating the maximum likelihood estimates. Indeed, this is a consequence of the Fisher-Neyman factorisation criterion which characterises a sufficient statistic for a probability distribution having density f θ ( x ) as one for which f θ ( x ) factorises as f θ ( x ) = g θ ( T ) h ( x 0 , … , x n ) , where g θ is a function of T depending on θ, and h is a function independent of θ.

In our formulation of the main theorem we require a definition of an approximate sufficient statistic, where we are interested in the degree to which a stochastic model should be modified to make the statistic sufficient; see McCullagh (1984) for a discussion of local sufficiency or Chapter 5 of Le Cam (1986) for a discussion of approximate sufficiency, for example.

Definition 2:

In respect of equispaced sampled data r t 0 , r t 1 , … , r t n , with time step Δ = t 1 − t 0 , from the stochastic process r = ( r t ) t ≥ 0 , having parameter θ ∈ ℝ d , a statistic T = T ( r t 0 , r t 1 , … , r t n ) ∈ ℝ m , for some m ∈ ℤ + , is said to be approximately sufficient of order k if knowledge of the value of T allows us to specify the θ-dependent part of the likelihood function determined by r t 0 , r t 1 , … , r t n up to a tolerance O ( n Δ k + 1 ) , as Δ → 0 . Mathematically speaking, there exist a function g θ ( T ) , which depends on the statistic T and parameter θ, a function h ( r t 0 , r t 1 , … , r t n ) , which is independent of θ, and positive constants K 0 and K 1 such that for all Δ ∈ [ 0 , K 1 ] ,

| log L ( r t 0 , r t 1 , … , r t n ; θ ) − log g θ ( T ( r t 0 , r t 1 , … , r t n ) ) − log h ( r t 0 , r t 1 , … , r t n ) | ≤ K 0 n Δ k + 1 .

Theorem 1:

Given the observations of the square root process in (10) , the maximum likelihood estimates of the parameters r ¯ , κ, σ of the model specified in (3) are approximately given by, as Δ → 0 ,

(12) r ¯ ^ = 2 ( v + 1 ) a e k − e − k , κ ^ = 2 k Δ , σ ^ = 8 a k ( e k − e − k ) Δ ,

respectively, subject to the condition

(Condition A) ( q ′ 0 ) 2 − 2 q 0 q ″ 0 > 0 ,

where

(13) k = − q ′ 0 + s i g n ( q ′ 0 ) ( q ′ 0 ) 2 − 2 q 0 q ″ 0 q ′ 0 , q 0 = 1 2 R 5 b 0 2 − ( 1 4 R 5 f 0 − R 3 ) b 0 c 0 − ( 1 2 R 3 f 0 − h 0 ) c 0 2 , q ′ 0 = R 5 b 0 b ′ 0 − { 1 4 R 5 f ′ 0 b 0 c 0 + ( 1 4 R 5 f 0 − R 3 ) ( b ′ 0 c 0 + b 0 c ′ 0 ) } − { ( 1 2 R 3 f ′ 0 − h ′ 0 ) c 0 2 + 2 ( 1 2 R 3 f 0 − h 0 ) c 0 c ′ 0 } ,

q ″ 0 = R 5 { b 0 b ″ 0 + ( b ′ 0 ) 2 } − 1 4 R 5 ( f ′ 0 b 0 c 0 ) ( f ″ 0 / f ′ 0 + b ′ 0 / b 0 + c ′ 0 / c 0 ) − ( 1 4 R 5 − R 3 ) ( b ″ 0 c 0 + 2 b ′ 0 c ′ 0 + b 0 c ″ 0 ) − ( − 1 2 R 3 f ″ 0 − h ′ 0 ) c 0 2 − 4 ( − 1 2 R 3 f ′ 0 − h ′ 0 ) c 0 c ′ 0 − 2 ( − 1 2 R 3 f 0 − h 0 ) ( c 0 c ″ 0 + ( c ′ 0 ) 2 ) , a = b k c k , v = k + 1 2 L a R 3 + 1 2 a 2 R 5 ,

b k = 3 8 R 3 f k 2 + ( 3 4 R 3 2 R 5 − 5 4 h k ) f k − 3 2 R 3 R 5 h k + g k , c k = − 1 16 R 5 f k 2 + 5 8 R 3 f k + 1 − 3 2 h k + 3 2 R 3 2 R 5 , f k = − R 0 e − k + R 1 e k , g k = R 0 e − k + R 1 e k − 2 R 2 , h k = k + 1 2 L

and where we have the second order approximate sufficient statistics L, R i for i ∈ { 1,2,3,5 }

(14) L = 1 n log ( r t n / r t 0 ) ,

(15) R 0 = 1 n ∑ i = 1 n r t i − 1 ,

(16) R 1 = 1 n ∑ i = 1 n r t i ,

(17) R 2 = 1 n ∑ i = 1 n r t i − 1 r t i ,

(18) R 3 = 1 n ∑ i = 1 n 1 r t i − 1 r t i ,

(19) R 4 = 1 n ∑ i = 1 n log ( r t i − 1 r t i ) ,

(20) R 5 = 1 n ∑ i = 1 n 1 r t i − 1 r t i ,

(21) R 6 = 1 n ∑ i = 1 n 1 ( r t i − 1 r t i ) 3 / 2 .

Also, the Fisher information matrix is approximately given by

(22) I ( r ¯ , κ , σ ) ≈ B T J ( a , k , v ) B ,

where

(23) a = σ 2 4 κ { e κ Δ / 2 − e − κ Δ / 2 } , k = 1 2 κ Δ , v = 2 r ¯ κ σ 2 − 1 ,

(24) B = ( 0 σ 2 κ Δ 3 / 48 σ Δ / 2 0 Δ / 2 0 2 κ / σ 2 2 r ¯ / σ 2 − 4 r ¯ κ / σ 3 ) ,

and

(25) J ( a , k , v ) = − n ( 1 2 a 2 − 1 a 3 { R 0 e − k + R 1 e k − 2 R 2 } 1 2 a 2 { − R 0 e − k + R 1 e k } − R 3 v 1 2 a 2 { − R 0 e − k + R 1 e k } − 1 2 a { R 0 e − k + R 1 e k } 1 − R 3 v 1 − a R 3 ) .

Finally, we have that as n → ∞ ,

(26) ( r ¯ ^ κ ^ σ ^ ) − ( r ¯ κ σ ) → N ( 0 , I ( r ¯ , κ , σ ) − 1 ) .

Proof:

See Appendix A for a proof.

The following theorem supplies parameter estimates of the square root process using a first order approximation to the log-likelihood function.

Theorem 2:

Given the observations of the short rate in (10), the maximum likelihood estimates of the parameters r ¯ , κ, σ of the CIR model specified in (3) are approximately given by, as Δ → 0 ,

(27) r ¯ ^ = 2 ( v + 1 ) a e k − e − k , κ ^ = 2 k Δ , σ ^ = 8 a k ( e k − e − k ) Δ ,

respectively, subject to the condition

(Condition A’) ( p 0 ′ ) 2 − 2 p 0 p ″ 0 > 0 ,

where

(28) k = − p ′ 0 + sign ( p ′ 0 ) ( p ′ 0 ) 2 − 2 p 0 p ″ 0 p ″ 0 ,

(29) p 0 = 1 4 ( 1 2 R 3 f 0 − h 0 ) 2 − ( 1 2 R 3 f 0 − h 0 ) + R 3 g 0 − h 0 2 p ′ 0 = 1 2 ( 1 2 R 3 f ′ 0 − h ′ 0 ) ( 1 2 R 3 f 0 − h 0 ) − ( 1 2 R 3 f ′ 0 − h ′ 0 ) + R 3 g ′ 0 − 2 h ′ 0 h 0 p ″ 0 = 1 2 ( 1 2 R 3 f 0 ″ − h ″ 0 ) ( 1 2 R 3 f 0 − h 0 ) + 1 2 ( 1 2 R 3 f 0 ′ − h ′ 0 ) 2 − ( 1 2 R 3 f 0 ″ − h ″ 0 ) + R 3 g ″ 0 − 2 ( h ″ 0 h 0 + ( h ′ 0 ) 2 ) ,

(30) a = 1 2 f k − h k R 3 , v = h k a R 3 , p k = 1 4 ( 1 2 R 3 f k − h k ) 2 − ( 1 2 R 3 f k − h k ) + R 3 g k − h k 2 , f k = − R 0 e − k + R 1 e k , g k = R 0 e − k + R 1 e k − 2 R 2 , h k = k + 1 2 L

and where we have the first order approximate sufficient statistics L, R i for i ∈ { 1 , 2 , 3 } given in (14)–(18) .

Also, the Fisher information matrix is given approximately in (22). Finally, asymptotic normality, as n → ∞ , is as in (26).

Proof:

See Appendix A for a proof.

The following corollary to Theorem 1 is straightforwardly established.

Corollary 1:

Given the observations of the short rate in (10) , the maximum likelihood estimates of the parameters p, q and σ of the 3/2 short rate model given in (5) are

(31) p ^ = κ ^ , q ^ = σ ^ 2 − r ¯ ^ κ ^ ,

where formulae for r ¯ ^ , κ ^ and σ ^ , based upon the reciprocals of the short rates, are supplied in Theorem 1.

4 Applications

In this section we supply various aspects of the estimation methods. Firstly, we check the ability of the estimated parameters from simulated data to recover the actual parameter values, for varying lengths of the series. Secondly, we fit the models to US interest rate data. Thirdly, we compare the run times with the Newton-Raphson method.

4.1 Estimation from simulated data

Here we simulate data sets from each of the processes, squared-Bessel, 3 / 2 and Bessel processes, and estimate its parameters over varying window lengths and time steps Δ .

4.1.1 CIR model

Commencing with r 0 = 0.05 , we simulate with a daily time step, Δ = 1 / 365 , the CIR model having parameter values r ¯ = 0.0427 , κ = 0.095 and σ = 0.06439 , which correspond closely to those estimated for the CIR model in Table 1. We compare estimation of parameters over varying window lengths, from one quarter to 150 years, and time steps Δ ∈ { 1 4 , 1 12 , 1 52 , 1 365 } . The evolutions of estimates of the parameters r ¯ , κ, σ and ν = 4 r ¯ κ / σ 2 are depicted respectively in Figures 1–4, varying the time steps. We maintain shorter time horizons in these figures to allow the differences in estimated parameter values among the various time steps to be shown clearly. Because the effects of varying the time steps on estimated values become less pronounced over the long term, making them almost indistinguishable, in Figure 5 we show only the estimates of the parameters using Δ = 1 / 365 . This allows for a comparison of the behaviour of estimated values of different parameters over various window lengths up to 150 years.

Table 1:

Parameter estimates of the CIR, 3/2 and Bessel process short rate models fitted to monthly US short rates via several methods.

Model	Method	Log-likelihood	Parameter estimates
Vasicek	Exact formula	7109.33145040	r ¯ ^ = 0.043648 ( 0.008476 )
			κ ^ = 0.149909 ( 0.045259 )
			σ ^ = 0.015437 ( 0.000261 )
CIR	Newton-Raphson	7559.08057481	r ¯ ^ = 0.042712 ( 0.011234 )
			κ ^ = 0.094537 ( 0.034902 )
			σ ^ = 0.064243 ( 0.001083 )
			ν ^ = 3.913439 ( 1.067639 )
CIR	1st order approx	7559.08042386	r ¯ ^ = 0.042724 ( 0.011232 )
			κ ^ = 0.094962 ( 0.035044 )
			σ ^ = 0.064244 ( 0.001083 )
			ν ^ = 3.931961 ( 1.067294 )
CIR	2nd order approx	7559.08057479	r ¯ ^ = 0.042712 ( 0.011234 )
			κ ^ = 0.094543 ( 0.034907 )
			σ ^ = 0.064243 ( 0.001083 )
			ν ^ = 3.913676 ( 1.067673 )
3/2	Newton-Raphson	7213.56087570	p ^ = 0.110247 ( 0.038861 )
			q ^ = − 0.019455 ( 1.216960 )
			σ ^ = 2.251059 ( 0.037980 )
			ν ^ = 4.015358 ( 0.960646 )
3/2	1st order approx	7213.56078067	p ^ = 0.110624 ( 0.038916 )
			q ^ = − 0.036201 ( 1.220414 )
			σ ^ = 2.251093 ( 0.037982 )
			ν ^ = 4.028575 ( 0.963344 )
3/2	2nd order approx	7213.56087569	p ^ = 0.110250 ( 0.038894 )
			q ^ = − 0.019558 ( 1.219089 )
			σ ^ = 2.251059 ( 0.037980 )
			ν ^ = 4.015439 ( 0.962326 )
Bessel	Newton-Raphson	7168.45470719	α ^ = 0.000151 ( 0.000015 )
			β ^ = − 0.106942 ( 0.026841 )
			γ ^ = 0.016822 ( 0.000403 )
			ν ^ = 2.068122 ( 0.092544 )
Bessel	1st order approx	7162.69765360	α ^ = 0.000140 ( 0.000014 )
			β ^ = − 0.096360 ( 0.024824 )
			γ ^ = 0.015654 ( 0.000314 )
			ν ^ = 2.146483 ( 0.101064 )
Bessel	2nd order approx	7164.59619582	α ^ = 0.000138 ( 0.000014 )
			β ^ = − 0.096364 ( 0.025122 )
			γ ^ = 0.015836 ( 0.000324 )
			ν ^ = 2.100565 ( 0.100157 )

$Figure 1: In respect of the CIR model, a graph of the evolution of the estimate of r ¯ $\overline{r}$ from 1/4 to 20 years of simulated daily data.$

Figure 1:

In respect of the CIR model, a graph of the evolution of the estimate of r ¯ from 1/4 to 20 years of simulated daily data.

Figure 2:

In respect of the CIR model, a graph of the evolution of the estimate of κ from 1/4 to 20 years of simulated daily data.

Figure 3:

In respect of the CIR model, a graph of the evolution of the estimate of σ from 1/4 to 20 years of simulated daily data.

$Figure 4: In respect of the CIR model, a graph of the evolution of the estimate of ν = 4 r ¯ κ / σ 2 $\nu =4\overline{r}\kappa /{\sigma }^{2}$ from 1/4 to 20 years of simulated daily data.$

Figure 4:

In respect of the CIR model, a graph of the evolution of the estimate of ν = 4 r ¯ κ / σ 2 from 1/4 to 20 years of simulated daily data.

$Figure 5: In respect of the CIR model, graphs of the evolutions of the estimates of r ¯ $\overline{r}$ , κ, σ and ν from 1/4 to 150 years of simulated daily data. The dashed lines surrounding each of the evolutions of estimates provide a 95% confidence interval for the parameter value. The dashed horizontal red line shows the true value of the parameter.$

Figure 5:

In respect of the CIR model, graphs of the evolutions of the estimates of r ¯ , κ, σ and ν from 1/4 to 150 years of simulated daily data. The dashed lines surrounding each of the evolutions of estimates provide a 95% confidence interval for the parameter value. The dashed horizontal red line shows the true value of the parameter.

In Figure 1 we show the effect of Δ on the estimate of r ¯ . For Δ = 1 / 52 and Δ = 1 / 365 the estimates are almost indistinguishable. For Δ = 1 / 4 the estimates with window lengths of less than 3 years are wildly unstable, which is to be expected when using so few data points. It is not until around 3 years, corresponding to 12 data points, that we see some stability. For Δ = 1 / 12 the estimates of r ¯ are unstable over window lengths less than one year, corresponding to fewer than 12 data points. We see from Figure 5a that window lengths of many years are required to estimate the true value of r ¯ . In fact, after 25 years the estimates remain within about 50% of the true value and accurate estimates of r ¯ are achieved for windows around 100 years in length.

In Figure 2 we show the effect of Δ on the estimate of κ. Beyond window lengths of about 10 years all four time steps give rise to similar estimated parameter values. However, for window lengths of less than 10 years all time steps fail to give good estimates of κ. We see from Figure 5b that the 95% confidence intervals contain the true value of κ for window lengths of at least 10 years. Also, it is not until around 25 years when we see estimates of κ being within about 80% of the true value and until around 85 years for estimates to be within roughly 50% of the true value.

In Figure 3 we show the effect of Δ on the estimate of σ. For Δ = 1 / 4 , the estimates stabilise after two and a half years, corresponding to 10 or more data points. For Δ = 1 / 12 , estimates stabilise after one year, namely for 12 or more data points. For Δ = 1 / 52 , estimates stabilise after half a year, which is for 26 or more data points. Finally, for Δ = 1 / 365 , the estimates are stable over windows at least as long as one quarter of a year. We see in Figure 5c that estimates of σ are within 0.001 of the true value over horizons beyond 10 years and are almost exact to four decimal places after 50 years.

In Figure 4 we show the effect of Δ on the estimate of ν = 4 r ¯ κ / σ 2 , which is the degrees of freedom parameter and which is crucial in ascertaining whether the process avoids hitting zero, namely when ν ≥ 2 . For all Δ ∈ { 1 4 , 1 12 , 1 52 , 1 365 } , the estimates are unstable for windows shorter than two and a half to three years, after which the estimates provide similar values. In Figure 5d we see that the estimates do not appear to converge to the true value until around 30 years. Yet, even for windows longer than 30 years the estimates hover above the true value by 1 to 1.25, which is roughly 25% of the true value.

In conclusion, σ and r ¯ are the only parameters which can be accurately estimated over windows shorter than 150 years. With daily sampling of the process, good estimates of σ can be obtained in about half a year and with weekly sampling, good estimates are obtainable in about two years. For windows longer than 10 years, quarterly, monthly, weekly and daily sampling give rise to estimates of similar accuracy. With such accurate estimates of σ, we see from the relation σ 2 = 4 r ¯ κ / ν that inaccuracies in any one of the parameters r ¯ , κ and ν will result to some extent in inaccuracies in the other two. This is exemplified in Figures 1, 2 and 4 and, over the longer term, in Figure 5. In practical applications, such as when modelling interest rates, the choice of Δ will depend on the window length, that is the length in years of the data series, all other variables being equal. Assuming interest rate cycles of around 20–40 years, having 50–100 years of data generally allows sufficient time for interest rates to revert, meaning that r ¯ and κ, in particular, can be estimated fairly reliably, and meaning that using Δ = 1 / 4 , or even Δ = 1 , will give good estimates. On the other hand, having only five years of data, using Δ ≤ 1 / 52 will give reliable estimates only for σ, with r ¯ and κ having large standard errors.

4.1.2 3/2 model

Commencing with r 0 = 0.05 , we simulate with a daily time step, Δ = 1 / 365 , the 3/2 model having parameter values p = 0.11 , q = − 0.019 and σ = 2.25093 , which correspond closely to those estimated for the 3/2 model in Table 1. We compare estimation of parameters over the same varying window lengths and time steps examined in the CIR model. The evolutions of estimates of the parameters p, q, σ and ν = 4 ( 1 − q / σ 2 ) are depicted respectively in Figures 6–9, varying the time steps. For the same reasons given for the CIR model, we maintain shorter time horizons in these figures, whereas in Figure 10 we show only the estimates of the parameters using Δ = 1 / 365 over window lengths up to 150 years.

Figure 6:

In respect of the 3 / 2 model, a graph of the evolution of the estimate of p from 1/4 to 20 years of simulated daily data.

Figure 7:

In respect of the 3/2 model, a graph of the evolution of the estimate of q from 1/4 to 20 years of simulated daily data.

Figure 8:

In respect of the 3/2 model, a graph of the evolution of the estimate of σ from 1/4 to 20 years of simulated daily data.

$Figure 9: In respect of the 3/2 model, a graph of the evolution of the estimate of ν = 4 ( 1 − q / σ 2 ) $\nu =4\left(1-q/{\sigma }^{2}\right)$ from 1/4 to 20 years of simulated daily data.$

Figure 9:

In respect of the 3/2 model, a graph of the evolution of the estimate of ν = 4 ( 1 − q / σ 2 ) from 1/4 to 20 years of simulated daily data.

$Figure 10: In respect of the 3/2 model, graphs of the evolutions of the estimates of p, q, σ and ν = 4 ( 1 − q / σ 2 ) $\nu =4\left(1-q/{\sigma }^{2}\right)$ from 1/4 to 150 years of simulated daily data. The dashed lines surrounding each of the evolutions of estimates provide a 95% confidence interval for the parameter value. The dashed horizontal red line shows the true value of the parameter.$

Figure 10:

In respect of the 3/2 model, graphs of the evolutions of the estimates of p, q, σ and ν = 4 ( 1 − q / σ 2 ) from 1/4 to 150 years of simulated daily data. The dashed lines surrounding each of the evolutions of estimates provide a 95% confidence interval for the parameter value. The dashed horizontal red line shows the true value of the parameter.

In Figure 6 we show the effect of Δ on the estimate of p. In (31) we see that p is equal to κ in the corresponding CIR model for the reciprocal of the 3/2 process. Therefore the effects of Δ on p ^ correspond well with those on κ ^ shown in Figure 2. Beyond window lengths of about 10 years all four time steps give rise to similar estimated parameter values. However, for windows shorter than 10 years all time steps fail to give good estimates of p. We see from Figure 10a that the 95% confidence intervals contain the true value of p for window lengths of at least 10 years. Also, it is not until after around 25 years when we see estimates of p remaining within about 100% of the true value and until around 85 years for estimates to be within roughly 50% of the true value.

In Figure 7 we show the effect of Δ on the estimate of q. In (31) we see that q is equal to σ 2 − r ¯ κ in the corresponding CIR model, and this can be rewritten in terms of σ and ν as σ 2 ( 1 − ν / 4 ) . Therefore, if σ ^ is accurate then we would expect the effect of Δ on q to be similar to that on ν. Indeed Figure 10b appears to be a mirror image of Figure 5d. Beyond window lengths of about 10 years all four time steps give rise to similar estimated parameter values. We see from Figure 10b that windows at least around 25 years in length give estimates of q within 2 of the true value and windows at least around 85 years gives estimates within 1 of the true value.

In Figure 8 we show the effect of Δ on the estimate of σ, which are similar to those on σ in the corresponding CIR model. For windows longer than 14 years, the effects of Δ wear off. For Δ = 1 / 4 and Δ = 12 , the estimates stabilise after 12 years or so, corresponding to at least 48, respectively 144, data points. For Δ = 1 / 52 , estimates stabilise after 2 years, corresponding to 104 or more data points. Finally, for Δ = 1 / 365 , the estimates are stable over windows at least as long as one quarter of a year. We see in Figure 10c that estimates of σ are within 0.05 of the true value after 5 years and are within 0.01 over horizons beyond 45 years.

In Figure 9 we show the effect of Δ on the estimate of ν = 4 ( 1 − q / σ 2 ) , which is the degrees of freedom parameter and which is crucial in ascertaining whether the process avoids hitting zero, namely when ν ≥ 2 . For all Δ ∈ { 1 4 , 1 12 , 1 52 , 1 365 } , the estimates are unstable for windows shorter than two and a half to three years, after which the estimates provide similar values. In Figure 10d we see that the estimates do not appear to converge to the true value until around 30 years. Yet, even for windows longer than 30 years the estimates hover above the true value by 1–1.25, which is roughly 25% of the true value.

In conclusion, σ is the most accurately estimated parameter over all window lengths up to 150 years, followed by p and lastly q. With daily sampling of the process, good estimates of σ can be obtained in about half a year and with weekly sampling, good estimates are obtainable in about 2 years. For windows longer than 10 years, quarterly, monthly, weekly and daily sampling give rise to estimates of similar accuracy. With such accurate estimates of σ, we see from the relation q = σ 2 ( 1 − ν / 4 ) that inaccuracies in q and ν are linearly related. This is exemplified in Figures 7 and 9 and, over the longer term, in Figures 10b and 10d. In practical applications, the choice of Δ is similar to the choice for the CIR model.

4.1.3 Bessel model

Commencing with r 0 = 0.05 , we simulate with a daily time step, Δ = 1 / 365 , the Bessel model having parameter values α = 0.000151 , β = − 0.107 and γ = 0.01682 , which correspond closely to those estimated for the Bessel model in Table 1. We compare estimation of parameters over the same varying window lengths and time steps examined in the CIR model. The evolutions of estimates of the parameters α, β, γ and ν = 1 + 2 α / γ 2 are depicted respectively in Figures 11–14, varying the time steps. For the same reasons given for the CIR model, we maintain shorter time horizons in these figures, whereas in Figure 15 we show only the estimates of the parameters using Δ = 1 / 365 over window lengths up to 150 years.

Figure 11:

In respect of the Bessel model, a graph of the evolution of the estimate of α from 1/4 to 20 years of simulated daily data.

Figure 12:

In respect of the Bessel model, a graph of the evolution of the estimate of β from 1/4 to 20 years of simulated daily data.

Figure 13:

In respect of the Bessel model, a graph of the evolution of the estimate of γ from 1/4 to 20 years of simulated daily data.

$Figure 14: In respect of the Bessel model, a graph of the evolution of the estimate of ν = 1 + 2 α / γ 2 $\nu =1+2\alpha /{\gamma }^{2}$ from 1/4 to 20 years of simulated daily data.$

Figure 14:

In respect of the Bessel model, a graph of the evolution of the estimate of ν = 1 + 2 α / γ 2 from 1/4 to 20 years of simulated daily data.

Figure 15:

In respect of the Bessel model, graphs of the evolutions of the estimates of α, β, γ and ν from 1/4 to 150 years of simulated daily data. The dashed lines surrounding each of the evolutions of estimates provide a 95% confidence interval for the parameter value. The dashed horizontal red line shows the true value of the parameter.

In Figure 11 we show the effect of Δ on the estimate of α, where stability in estimates is achieved in windows of around 5.7 years in length and longer, regardless of the time step. For shorter windows, the estimates all diverge from the true value by about 0.005 for windows longer than one year, and diverge in varying amounts for windows shorter than one year. We see from Figure 15a that estimates are within 0.00015 for window lengths between 7.5 and 130 years, and converging very close to the true value beyond 130 years.

In Figure 12 we show the effect of Δ on the estimate of β. For windows shorter than 5.7 years the estimates vary with Δ but fail to achieve any value consistently close to the true value. Beyond this, all values of Δ provide similarly valued estimates, within 0.1 of the true value of −0.107. We see from Figure 15b that windows between 60 and 100 years in length provide estimates within 0.05 of the true value, and after 100 years the estimates are almost exact.

In Figure 13 we show the effect of Δ on the estimate of γ. For Δ = 1 / 365 the estimates are very close to the true value, with estimates being almost indistinguishable from the true value after 7.5 years. The larger the value of Δ the worse the estimates are, although most of estimates improve beyond 7.5 years, except for Δ = 1 / 4 . We see in Figure 15c that estimates of γ, based on Δ = 1 / 365 , are within 0.0001 of the true value after 10 years and are within about 0.00005 beyond 50 years.

In Figure 14 we show the effect of Δ on the estimate of ν = 1 + 2 α / γ 2 , for the same reasons given for the CIR and 3/2 models. For all Δ ∈ { 1 4 , 1 12 , 1 52 , 1 365 } , the estimates are unstable for windows shorter than 5.5 years, after which the estimates provide similar values and achieving stability after 7.5 years. In Figure 15d we see that the estimates are within 1 of the true value beyond 10 years and within 0.1 beyond 133 years.

In conclusion, γ is the most accurately estimated parameter over all windows, particularly for Δ = 1 / 365 , which gives good estimates over windows about half a year in length. The other parameters can be estimated fairly reliably over longer windows. The choice of Δ is largely determined by the data window. For shorter windows using Δ = 1 / 365 is imperative. However, for windows over several decades, quarterly sampling provides reliable estimates.

4.2 Estimation from historical interest rates

In this section we demonstrate the parameter estimation for each of the processes when used to model interest rates. For the set of monthly US short rates over the period 1871–2019, as given by the one-year rates in Shiller (2019) and supplemented by six-month LIBOR rates in Federal Reserve Bank of St. Louis (2019), we obtain the maximum likelihood estimates in Table 1. The Newton-Raphson approximation algorithm was used to compute the exact parameter estimates for each of the models, whereas the estimates derived using the first and second term approximations employ the formulae given in Theorem 1 and Corollary 1. The parameter estimates of the Vasicek model were used in determining an initial set of parameter estimates for the Newton-Raphson method applied to the CIR model, as described in Fergusson and Platen (2015).

We can draw several conclusions from Table 1, where the data series is monthly and nearly 150 years long. Firstly, simulations performed in Section 4.1.1 for the CIR model, having the same parameters used in Table 1, indicate that the minimum window length for reasonable estimates of r ¯ is around 30 years, as shown in Figure 5a. For κ, this is around 90 years, as shown in Figure 5b and for σ, this is around 14 years for monthly or less frequent observations, as shown in Figure 3. So based on the monthly frequency and 150 year series length, we would expect the parameter estimates for the CIR model in Table 1 to be reliable. Secondly, analogous simulations performed in Section 4.1.2 for the 3/2 model indicate that the minimum window length for estimates of p within 0.1 of the true value is around 20 years, as shown in Figure 10a. For q being within 1 of its true value, this is around 90 years, as shown in Figure 10b and for σ, this is around 14 years, as shown in Figure 8. As for the CIR model, we would expect the parameter estimates for the 3/2 model in Table 1 to be reliable. Thirdly, analogous simulations performed in Section 4.1.3 for the Bessel model indicate that the minimum window length for estimates of α within 0.00006 of the true value is around 133 years, as shown in Figure 15a. For β being within 0.01 of its true value, this is around 101 years, as shown in Figure 15b and for γ being within 0.001 of its true value, this is around 8 years, as shown in Figure 13. As for the two previous models, we would expect the parameter estimates for the Bessel model in Table 1 to be reliable. In general, the minimum series length is around 100 years for estimating parameters of any of our models, for all choices of Δ ≤ 1 / 4 . Finally, using either the log-likelihood, AIC or BIC as the criterion, the CIR model is the best of the three models.

To illustrate the estimation methods for more frequent observations, for the set of daily US Effective Federal Funds rates over the period July 1954 to January 2020, as given in Federal Reserve Bank of St. Louis (2020), we obtain the maximum likelihood estimates in Table 2. We note that the second order approximation in respect of the Bessel model fails due to Condition A not being met.

Table 2:

Parameter estimates of the CIR, 3/2 and Bessel process short rate models fitted to daily Effective Federal Funds rates via several methods.

Model	Method	Log-likelihood	Parameter estimates
Vasicek	Exact formula	102991.65	r ¯ ^ = 0.048432 ( 0.005261 )
			κ ^ = 1.476785 ( 0.510347 )
			σ ^ = 0.062912 ( 0.000291 )
CIR	Newton-Raphson	108524.04	r ¯ ^ = 0.048446 ( 0.005710 )
			κ ^ = 1.061058 ( 0.126698 )
			σ ^ = 0.292883 ( 0.001342 )
			ν ^ = 2.397021 ( 0.157883 )
CIR	1st order approx	108524.03	r ¯ ^ = 0.048451 ( 0.010082 )
			κ ^ = 1.064971 ( 0.252687 )
			σ ^ = 0.292780 ( 0.001343 )
			ν ^ = 2.407777 ( 0.176680 )
CIR	2nd order approx	108524.04	r ¯ ^ = 0.048450 ( 0.007161 )
			κ ^ = 1.060475 ( 0.170004 )
			σ ^ = 0.292790 ( 0.001341 )
			ν ^ = 2.397399 ( 0.163386 )
3/2	Newton-Raphson	81391.08	p ^ = 4.854270 ( 0.371083 )
			q ^ = 400.427422 ( 4.604012 )
			σ ^ = 31.562020 ( 0.144021 )
			ν ^ = 2.392118 ( 0.023049 )
3/2	1st order approx	81391.06	p ^ = 4.876184 ( 0.350792 )
			q ^ = 396.963983 ( 0.000000 )
			σ ^ = 31.549896 ( 0.143300 )
			ν ^ = 2.404800 ( 0.000000 )
3/2	2nd order approx	81391.07	p ^ = 4.851194 ( 0.374552 )
			q ^ = 400.090431 ( 4.202087 )
			σ ^ = 31.552349 ( 0.143901 )
			ν ^ = 2.392486 ( 0.022038 )
Bessel	Newton-Raphson	127474.83	α ^ = 0.003649 ( 0.000059 )
			β ^ = − 1.976072 ( 0.175107 )
			γ ^ = 0.085175 ( 0.000454 )
			ν ^ = 2.006025 ( 0.012245 )
Bessel	1st order approx	127474.83	α ^ = 0.003649 ( 0.000059 )
			β ^ = − 1.976072 ( 0.175107 )
			γ ^ = 0.085175 ( 0.000454 )
			ν ^ = 2.006025 ( 0.012245 )
Bessel	2nd order approx	Undefined (Condition A is not met)	α ^ undefined
			β ^ undefined
			γ ^ undefined
			ν ^ undefined

We can draw several conclusions from Table 2, where the data series is daily and roughly 65 1/2 years long. Firstly, simulations performed in Section 4.1.1 for the CIR model, despite being based on different parameters, indicate that the respective minimum window lengths for reasonable estimates of r ¯ , κ and σ are around 30, 90 and 14 years for daily observations. So based on the daily frequency and 65 1/2 year series length, we would expect the parameter estimates for r ¯ and σ of the CIR model in Table 2 to be reliable, while the estimate of κ to be unreliable. Secondly, analogous simulations performed in Section 4.1.2 for the 3/2 model indicate that the respective minimum window lengths for estimates of p, q and σ to be around 20, 90 and 14 years for daily observations. So, the parameter estimates of p and σ are reliable while the estimate of q is not. Thirdly, analogous simulations performed in Section 4.1.3 for the Bessel model indicate that the respective minimum window lengths for estimates of α, β and γ are around 133, 101 and 8 years. So in Table 2, only the estimate of γ is reliable, while estimates of α and β are not. As mentioned previously, the minimum series length is around 130 to 140 years for estimating parameters for any of our models, for all choices of Δ ≤ 1 / 4 . With this caveat in mind, based on either the log-likelihood, AIC or BIC, the Bessel model is the best of the three models.

4.3 Comparison of run times

To give an idea of the substantial improvement in run times of the first order and second order approximation methods over the standard Newton-Raphson estimation method, we simulated in MATLAB 1000 data sets of CIR short rates, based on the CIR parameter values r ¯ = 0.041954 , κ = 0.093950 and σ = 0.064619 (i.e. ν = 3.775778 ), and then computed the average estimation run time per simulation, along with the averages of the parameter estimates. The results are shown in Table 3, where it is evident that the approximation methods take roughly two or three ten-thousandths of a second, far less than the 1 s or so taken on average by the Newton-Raphson method. This represents a speed-up of roughly 5,000 times with almost the same level of accuracy.

Table 3:

Average run time of CIR estimation method per simulated data set using CIR parameter estimates and 1000 simulations.

Method	Average run time	Average parameter estimates
Newton-Raphson	1.162925	r ¯ ^ = 0.039402 ( 0.015420 )
		κ ^ = 0.121061 ( 0.056607 )
		σ ^ = 0.072203 ( 0.008800 )
		ν ^ = 4.648468 ( 2.035282 )
1st order approx	0.000157	r ¯ ^ = 0.042323 ( 0.011295 )
		κ ^ = 0.127420 ( 0.048523 )
		σ ^ = 0.064637 ( 0.001141 )
		ν ^ = 4.894986 ( 1.662848 )
2nd order approx	0.000244	r ¯ ^ = 0.042316 ( 0.011302 )
		κ ^ = 0.126511 ( 0.048135 )
		σ ^ = 0.064636 ( 0.001141 )
		ν ^ = 4.861427 ( 1.657352 )

5 Conclusion

The closed-form formulae presented in this paper permit accurate parameter estimates of the CIR and 3/2 short rate model to be performed in spreadsheets and other high-level software packages, such as MATLAB and R, and permit exceedingly fast computations in computer algorithms. This significantly improved run time for parameter estimation has many applications where ex-ante forecasts are required frequently and immediately, such as in hedging interest rate derivatives and volatility based on such models.

Appendix

A Proofs of theorems 1 and 2

In this technical appendix we supply proofs of Theorems 1 and 2. Our proof of Theorem 1 is as follows.

Proof:

For convenience, we make the change of variables given in (23) and we rewrite the log-likelihood function in (11) in terms of these new variables as

(32) ℓ ( a , k , v ) = n { − log ( 2 ) − log a + k + 1 2 v L + v k − 1 2 a ( R 0 e − k + R 1 e k ) + 1 n ∑ i = 1 n log I v ( r t i − 1 r t i a ) } ,

where L, R 0 and R 1 are as in (14)–(16).

Our proof of the theorem involves solving for the new variables using an approximation to the first order derivatives of the log-likelihood function in (32).

The modified Bessel function of the first kind given in (7) can also be written as

(33) I ν ( z ) = 1 π ∫ 0 π e z cos θ cos ( ν θ ) d θ − sin ( ν π ) π ∫ 0 ∞ e − z cosh t − ν t d t ,

from which we can derive the asymptotic expansion, given in § 7.23 of Watson (1966),

(34) I ν ( z ) ∼ e z 2 π z ∑ j = 0 ∞ ( − 1 ) j ( ν , j ) ( 2 z ) j + e − z + ( ν + 1 2 ) π i 2 π z ∑ j = 0 ∞ ( ν , m ) ( 2 z ) m

as z → ∞ , where

(35) ( ν , j ) = 1 j ! ∏ m = 1 j ( ν 2 − 1 4 ( 2 m − 1 ) 2 )

and arg z ∈ ( − π / 2 , 3 π / 2 ) . Employing the first few terms of the expansion gives

(36) I ν ( z ) = e z 2 π z { 1 − ν 2 − 1 4 2 z + ( ν 2 − 1 4 ) ( ν 2 − 9 4 ) 2 ! ( 2 z ) 2 + O ν ( 1 z 3 ) } ,

where the subscript ν in the big-O notation indicates dependence of the implied constant upon ν. Taking logarithms of both sides gives

(37) log I ν ( z ) = z − 1 2 log ( 2 π z ) − ν 2 − 1 4 2 z − ν 2 − 1 4 4 z 2 + O ν ( 1 z 3 ) .

In terms of our new variables given in (23) we rewrite the log-likelihood function in (32) as

(38) ℓ ( a , k , v ) = ℓ 1 ( a , k , v ) + n O ν ( a 3 R 6 ) ,

where

(39) ℓ 1 ( a , k , v ) = n { log ( 2 2 π ) − 1 2 log a + ( v + 1 ) k + 1 2 v L − 1 2 a ( R 0 e − k + R 1 e k − 2 R 2 ) − 1 4 R 4 − 1 2 a R 3 ( v 2 − 1 4 ) − 1 4 a 2 R 5 ( v 2 − 1 4 ) } ,

and where L, R 0 , R 1 , R 2 , R 3 , R 4 , R 5 and R 6 are as in (14)–(21). According to (23), a ≈ 1 4 σ 2 Δ , for small Δ , so that in (38), 1 n ℓ ( a , k , v ) − 1 n ℓ 1 ( a , k , v ) = O ν , σ , R 6 ( Δ 3 ) as Δ → 0 , where the subscripted variables in the big-O notation indicate dependence of the implied constant on these variables. Therefore, knowledge of the values of L, R 0 , R 1 , R 2 , R 3 and R 5 is sufficient to determine the ( r ¯ , κ , σ ) -dependent part of ℓ 1 and therefore the statistic T = ( L , R 0 , R 1 , R 2 , R 3 , R 5 ) is approximately sufficient of the second order.

Our plan is firstly to show that the solution x 1 = ( a 1 , k 1 , v 1 ) ⊤ to

(40) f 1 ( a k v ) ≡ ( ∂ ℓ 1 ∂ a ∂ ℓ 1 ∂ k ∂ ℓ 1 ∂ v ) = ( 0 0 0 )

is close to the solution x = ( a , k , v ) ⊤ to

(41) f ( a k v ) ≡ ( ∂ ℓ ∂ a ∂ ℓ ∂ k ∂ ℓ ∂ v ) = ( 0 0 0 )

and secondly to solve (40).

Write h = x − x 1 and by Taylor’s Theorem

(42) f ( x 1 + h ) = f ( x 1 ) + D f ( x 1 + ξ 1 h ) h ,

for some ξ 1 ∈ ( 0 , 1 ) , where

(43) D f ( x ) = ( ∂ 2 ℓ ∂ a 2 ∂ 2 ℓ ∂ a ∂ k ∂ 2 ℓ ∂ a ∂ v ∂ 2 ℓ ∂ a ∂ k ∂ 2 ℓ ∂ k 2 ∂ 2 ℓ ∂ k ∂ v ∂ 2 ℓ ∂ a ∂ v ∂ 2 ℓ ∂ k ∂ v ∂ 2 ℓ ∂ v 2 ) .

Therefore,

(44) h = { D f ( x 1 + ξ 1 h ) } − 1 f ( x 1 ) .

We know that

(45) f ( x 1 ) − f 1 ( x 1 ) = ( n O v 1 ( a 1 2 R 6 ) 0 n O v 1 ( a 1 3 R 6 ) )

and that

(46) D f ( x 1 + ξ 1 h ) − D f 1 ( x 1 + ξ 1 h ) = ( n O v 1 ( a 1 R 6 ) 0 n O v 1 ( a 1 2 R 6 ) 0 0 0 n O v 1 ( a 1 2 R 6 ) 0 n O v 1 ( a 1 3 R 6 ) ) .

It is apparent from (44) and (45) that h is close to zero, that is, x 1 is close to x. Specifically,

(47) h = O v 1 [ { D f 1 ( x 1 + ξ 1 h ) } − 1 ( n a 1 2 R 6 0 n a 1 3 R 6 ) ] .

To solve (40), we rewrite ℓ 1 ( a , k , v ) as a quadratic in v, with the consequent inequality

(48) ℓ 1 ( a , k , v ) = n { − log ( 2 2 π ) − 1 2 log a + k − 1 2 a ( R 0 e − k + R 1 e k − 2 R 2 ) − 1 4 R 4 + 1 8 a R 3 + 1 16 a 2 R 5 + ( k + 1 2 L ) v − ( 1 2 a R 3 + 1 4 a 2 R 5 ) v 2 } = n { − log ( 2 2 π ) − 1 2 log a + k − 1 2 a ( R 0 e − k + R 1 e k − 2 R 2 ) − 1 4 R 4 + 1 8 a R 3 + 1 16 a 2 R 5 + ( k + 1 2 L ) 2 2 a R 3 + a 2 R 5 − ( 1 2 a R 3 + 1 4 a 2 R 5 ) ( v − k + 1 2 L a R 3 + 1 2 a 2 R 5 ) 2 } ≤ ℓ 2 ( a , k ) ,

where

(49) ℓ 2 ( a , k ) = n { − log ( 2 2 π ) − 1 2 log a + k − 1 2 a ( R 0 e − k + R 1 e k − 2 R 2 ) − 1 4 R 4 + 1 8 a R 3 + 1 16 a 2 R 5 + ( k + 1 2 L ) 2 2 a R 3 + a 2 R 5 }

and where, for ease of legibility in our algebra, the functions f, g and h are defined as

(50) f k = − R 0 e − k + R 1 e k , g k = R 0 e − k + R 1 e k − 2 R 2 , h k = k + 1 2 L .

Equality in (48) is achieved when

(51) v = k + 1 2 L a R 3 + 1 2 a 2 R 5 .

We now focus on determining the optimum parameter values of the new log-likelihood function ℓ 2 ( a , k ) .

The first order partial derivatives of ℓ 2 with respect to our new set of variables a and k are readily deduced to be

(52) ∂ ℓ 2 ∂ a = − n 2 a + n 2 a 2 g k + n 8 R 3 + n 8 a R 5 − n h k 2 ( 2 a R 3 + a 2 R 5 ) 2 ( 2 R 3 + 2 a R 5 ) , ∂ ℓ 2 ∂ k = n − n 2 a f k + 2 n h k 2 a R 3 + a 2 R 5 .

The maximum likelihood estimates are found by equating these first order partial derivatives to zero and solving for a and k.

Thus, from (52) we have the two equations

(53) ( R 3 + a R 5 ) h k 2 = ( 1 4 a 3 R 5 + 1 4 a 2 R 3 − a + g k ) ( R 3 + 1 2 a R 5 ) 2 ,

(54) h k = ( 1 2 f k − a ) ( R 3 + 1 2 a R 5 ) .

Rearranging (54) gives the quadratic equation in a

(55) 1 2 R 5 a 2 + ( R 3 − 1 4 R 5 f k ) a + h k − 1 2 R 3 f k = 0 .

In the light of (54), we can reduce the degree of the polynomial in a in (53) as follows

(56) 1 4 a 3 R 5 + 1 4 a 2 R 3 − a + g k = ( R 3 + a R 5 ) h k 2 ( R 3 + 1 2 a R 5 ) − 2 = ( R 3 + a R 5 ) ( 1 2 f k − a ) 2 = 1 4 R 3 f k 2 + ( 1 4 R 5 f k 2 − R 3 f k ) a + ( − R 5 f k + R 3 ) a 2 + R 5 a 3 .

This is a cubic equation in a that can be rearranged as

(57) 3 4 R 5 a 3 + ( − R 5 f k + 3 4 R 3 ) a 2 + ( 1 4 R 5 f k 2 − R 3 f k + 1 ) a + 1 4 R 3 f k 2 − g k = 0 .

Euclid’s greatest common divisor algorithm applied to the polynomial equations (55) and (57) gives a monic polynomial in a. The first step of this algorithm is (57) minus 3 2 a of (55), giving the quadratic polynomial equation in a

(58) ( − R 5 f k + 3 4 R 3 ) a 2 + 3 2 ( 1 4 R 5 f k − R 3 ) a 2 + ( 1 4 R 5 f k 2 − R 3 f k + 1 ) a + 3 4 R 3 f k a − 3 2 h k a + 1 4 R 3 f k 2 − g k = ( − 5 8 R 5 f k − 3 4 R 3 ) a 2 + ( 1 4 R 5 f k 2 − 1 4 R 3 f k + 1 − 3 2 h k ) a + 1 4 R 3 f k 2 − g k = 0 .

The second step of this algorithm is (58) minus 2 R 5 ( − 5 8 R 5 f k − 9 4 R 3 ) of (55), giving the monic polynomial equation in a

(59) ( 1 4 R 5 f k 2 − 1 4 R 3 f k + 1 − 3 2 h k ) a + 1 4 R 3 f k 2 − g k + 2 R 5 ( − 5 8 R 5 f k − 3 4 R 3 ) ( 1 4 R 5 f k − R 3 ) a + 1 R 5 ( − 5 8 R 5 f k − 3 4 R 3 ) R 3 f k − 2 R 5 ( − 5 8 R 5 f k − 3 4 R 3 ) h k = [ ( 1 4 R 5 f k 2 − 1 4 R 3 f k + 1 − 3 2 h k ) + 2 R 5 ( − 5 8 R 5 f k − 3 4 R 3 ) ( 1 4 R 5 f k − R 3 ) ] a + [ 1 4 R 3 f k 2 − g k + 1 R 5 ( − 5 8 R 5 f k − 3 4 R 3 ) R 3 f k − 2 R 5 ( − 5 8 R 5 f k − 3 4 R 3 ) h k ] = 0 .

Letting

(60) b k = 3 8 R 3 f k 2 + ( 3 4 R 3 2 R 5 − 5 4 h k ) f k − 3 2 R 3 R 5 h k + g k , c k = − 1 16 R 5 f k 2 + 5 8 R 3 f k + 1 − 3 2 h k + 3 2 R 3 2 R 5 ,

we have from (59) the solution for a

(61) a = b k c k .

Note that replacing a in (55) by the RHS expression in (61) gives

(62) 1 2 R 5 b k 2 + ( − 1 4 R 5 f k + R 3 ) b k c k + ( − 1 2 R 3 f k + h k ) c k 2 = 0 ,

which only involves the variable k. Defining the function q k as

(63) q k = 1 2 R 5 b k 2 + ( − 1 4 R 5 f k + R 3 ) b k c k + ( − 1 2 R 3 f k + h k ) c k 2 ,

and making the assumption that k is close to zero, an approximate solution can be found by solving

(64) q 0 + q ′ 0 k + 1 2 q ″ 0 k 2 = 0 ,

where

(65) q 0 = 1 2 R 5 b 0 2 − ( 1 4 R 5 f 0 − R 3 ) b 0 c 0 − ( 1 2 R 3 f 0 − h 0 ) c 0 2 , q ′ 0 = R 5 b 0 b ′ 0 − { 1 4 R 5 f ′ 0 b 0 c 0 + ( 1 4 R 5 f 0 − R 3 ) ( b ′ 0 c 0 + b 0 c ′ 0 ) } − { ( 1 2 R 3 f ′ 0 − h ′ 0 ) c 0 2 + 2 ( 1 2 R 3 f 0 − h 0 ) c 0 c ′ 0 } , q ″ 0 = R 5 ( b 0 b ″ 0 + ( b ′ 0 ) 2 − 1 4 R 5 ( f ′ 0 b 0 c 0 ) ( f ″ 0 / f ′ 0 + b ′ 0 / b 0 + c ′ 0 / c 0 ) − ( 1 4 R 5 − R 3 ) ( b ″ 0 c 0 + 2 b ′ 0 c ′ 0 + b 0 c ″ 0 ) − ( − 1 2 R 3 f ″ 0 − h ″ 0 ) c 0 2 − 4 ( − 1 2 R 3 f ′ 0 − h ′ 0 ) c 0 c ′ 0 − 2 ( − 1 2 R 3 f 0 − h 0 ) ( c 0 c ″ 0 + ( c ′ 0 ) 2 ) .

Therefore, from (64) we have

(66) k = − q ′ 0 ± ( q ′ 0 ) 2 − 2 q 0 q ″ 0 q ″ 0 ,

where the choice of sign of the square root is the same as that of q ′ 0 , thus being the root closest to zero. Then we can determine a from (61) and, finally, determine v from (51).

The original parameters are estimated as

(67) κ = 2 k Δ , σ = 4 κ a e k − e − k , r ¯ = ( v + 1 ) σ 2 2 κ .

The second order partial derivatives are

(68) ∂ 2 ℓ ∂ a 2 = n 2 a 2 − n a 3 { R 0 e − k + R 1 e k − 2 R 2 } , ∂ 2 ℓ ∂ a ∂ k = n 2 a 2 { − R 0 e − k + R 1 e k } , ∂ 2 ℓ ∂ a ∂ v = − n R 3 v , ∂ 2 ℓ ∂ k 2 = − n 2 a { R 0 e − k + R 1 e k } , ∂ 2 ℓ ∂ k ∂ v = n , ∂ 2 ℓ ∂ v 2 = − n a R 3 ,

and the Fisher information matrix with respect to the variables a, k and v is simply

(69) J ( a , k , v ) = − n ( 1 2 a 2 − 1 a 3 { R 0 e − k + R 1 e k − 2 R 2 } 1 2 a 2 { − R 0 e − k + R 1 e k } − R 3 v 1 2 a 2 { − R 0 e − k + R 1 e k } − 1 2 a { R 0 e − k + R 1 e k } 1 − R 3 v 1 − a R 3 ) .

From (23),

(70) ( ∂ a ∂ r ¯ ∂ a ∂ κ ∂ a ∂ σ ∂ k ∂ r ¯ ∂ k ∂ κ ∂ k ∂ σ ∂ v ∂ r ¯ ∂ v ∂ κ ∂ v ∂ σ ) ≈ ( 0 σ 2 κ Δ 3 / 48 σ Δ / 2 0 Δ / 2 0 2 κ / σ 2 2 r ¯ / σ 2 − 4 r ¯ κ / σ 3 ) ≡ B ,

for small values of κ Δ / 2 , and we have

(71) I ( r ¯ , κ , σ ) ≈ B T J ( a , k , v ) B .

Finally, asymptotic normality in (26) of the maximum likelihood estimates follows from (12) and asymptotic theory of MLEs; see for example Theorem 3.10 in Lehmann and Casella (1998).

Our proof of Theorem 2 is as follows.

Proof:

We continue in a similar vein as in the proof of Theorem 1 up to the equations formed by equating the first order partial derivatives in (52) to zero and solving for a and k, where instead the log-likelihood function in (32) is rewritten as

(72) ℓ ( a , k , v ) = ℓ 1 ( a , k , v ) + n O ν ( a 2 R 5 ) ,

where

(73) ℓ 1 ( a , k , v ) = n { log ( 2 2 π ) − 1 2 log a + ( v + 1 ) k + 1 2 v L − 1 2 a ( R 0 e − k + R 1 e k − 2 R 2 ) − 1 4 R 4 − 1 2 a R 3 ( v 2 − 1 4 ) } ≤ ℓ 2 ( a , k ) ≡ n { − log ( 2 2 π ) − 1 2 log a + k − 1 2 a g k − 1 4 R 4 + 1 8 a R 3 + h k 2 2 a R 3 }

with equality achieved when

(74) v = h k a R 3 ,

and where the first order partial derivatives of ℓ 2 are

(75) ∂ ℓ 2 ∂ a = − n 2 a + n 2 a 2 g k + n 8 R 3 − n h k 2 ( 2 a R 3 ) 2 ( 2 R 3 ) , ∂ ℓ 2 ∂ k = n − n 2 a f k + 2 n h k 2 a R 3 .

Thus, setting (75) to zero, we have the two equations

(76) R 3 h k 2 = ( 1 4 a 2 R 3 − a + g k ) R 3 2 ,

(77) h k = ( 1 2 f k − a ) R 3 .

Rearranging (77) gives the equation for a

(78) a = 1 2 f k − h k R 3 .

Eliminating a from (76) gives

(79) p k ≡ 1 4 ( 1 2 R 3 f k − h k ) 2 − ( 1 2 R 3 f k − h k ) + R 3 g k − h k 2 = 0 ,

which only involves the variable k and where we have defined the function p k as indicated. Making the assumption that k is close to zero, an approximate solution can be found by solving

(80) p 0 + p ′ 0 k + 1 2 p ″ 0 k 2 = 0 ,

where

(81) p 0 = 1 4 ( 1 2 R 3 f 0 − h 0 ) 2 − ( 1 2 R 3 f 0 − h 0 ) + R 3 g 0 − h 0 2 p ′ 0 = 1 2 ( 1 2 R 3 f ′ 0 − h ′ 0 ) ( 1 2 R 3 f 0 − h 0 ) − ( 1 2 R 3 f ′ 0 − h ′ 0 ) + R 3 g ′ 0 − 2 h ′ 0 h 0 p ″ 0 = 1 2 ( 1 2 R 3 f ″ 0 − h ″ 0 ) ( 1 2 R 3 f 0 − h 0 ) + 1 2 ( 1 2 R 3 f ′ 0 − h ′ 0 ) 2 − ( 1 2 R 3 f ″ 0 − h 0 ″ ) + R 3 g ″ 0 − 2 ( h ″ 0 h 0 + ( h ′ 0 ) 2 ) .

We remark that the expressions for p 0 , p ′ 0 and p 0 ″ in (81) can be obtained by placing R 5 = 0 in the expressions for q 0 , q ′ 0 and q 0 ″ in (65).

Therefore, from (80) we have

(82) k = − p ′ 0 ± ( p ′ 0 ) 2 − 2 p 0 p ″ 0 p ″ 0 ,

where the choice of sign of the square root is the same as that of p ′ 0 , thus being the root closest to zero. Then we can determine a from (78) and, finally, determine v from (74).

The original parameters are estimated as

(83) κ = 2 k Δ , σ = 4 κ a e k − e − k , r ¯ = ( v + 1 ) σ 2 2 κ .

The second order partial derivatives are

(84) ∂ 2 ℓ ∂ a 2 = n 2 a 2 − n a 3 { R 0 e − k + R 1 e k − 2 R 2 } , ∂ 2 ℓ ∂ a ∂ k = n 2 a 2 { − R 0 e − k + R 1 e k } , ∂ 2 ℓ ∂ a ∂ v = − n R 3 v , ∂ 2 ℓ ∂ k 2 = − n 2 a { R 0 e − k + R 1 e k } , ∂ 2 ℓ ∂ k ∂ v = n , ∂ 2 ℓ ∂ v 2 = − n a R 3

and the Fisher information matrix with respect to the variables a, k and v is simply

(85) J = − n ( 1 2 a 2 − 1 a 3 { R 0 e − k + R 1 e k − 2 R 2 } 1 2 a 2 { − R 0 e − k + R 1 e k } − R 3 v 1 2 a 2 { − R 0 e − k + R 1 e k } − 1 2 a { R 0 e − k + R 1 e k } 1 − R 3 v 1 − a R 3 ) .

The Fisher information matrix I with respect to the variables r ¯ , κ and σ is derived as in the proof of Theorem 1.

B Calculations of parameter standard errors in Table 1

We supply here the full details of the calculations of the standard errors of the parameters for the CIR model, given in Table 1. The calculations of the standard errors of the parameters for the 3/2 and Bessel models are calculated analogously. Once the maximum likelihood estimates of the parameters of the CIR model have been obtained, namely

r ¯ ^ = 0.042712 , κ ^ = 0.094537 , σ ^ = 0.064243 ,

we compute numerically the Fisher Information Matrix, approximated by the observed Fisher Information Matrix,

I ( r ¯ ^ , κ ^ , σ ^ ) = − ( ∂ 2 ℓ ∂ r ¯ 2 ∂ 2 ℓ ∂ r ¯ ∂ κ ∂ 2 ℓ ∂ r ¯ ∂ σ ∂ 2 ℓ ∂ κ r ¯ ∂ 2 ℓ ∂ κ 2 ∂ 2 ℓ ∂ κ ∂ σ ∂ 2 ℓ ∂ σ ∂ r ¯ ∂ 2 ℓ ∂ σ ∂ κ ∂ 2 ℓ ∂ σ 2 )

With h = 0.00001 , the value of the log-likelihood function at the maximum likelihood estimate vector

(86) ℓ ( r ¯ ^ , κ ^ , σ ^ ) = 7559.08057481 ,

and values of the log-likelihood function at points neighbouring the maximum likelihood estimate vector

(87) ℓ ( r ¯ ^ + h , κ ^ , σ ^ ) = 7559.08057408832 ℓ ( r ¯ ^ − h , κ ^ , σ ^ ) = 7559.08057408832 ℓ ( r ¯ ^ , κ ^ + h , σ ^ ) = 7559.08057473559 ℓ ( r ¯ ^ , κ ^ − h , σ ^ ) = 7559.0805747356 ℓ ( r ¯ ^ , κ ^ , σ ^ + h ) = 7559.08053039953 ℓ ( r ¯ ^ , κ ^ , σ ^ − h ) = 7559.08053261873 ,

(88) ℓ ( r ¯ ^ + h , κ ^ + h , σ ^ ) = 7559.08057369888 ℓ ( r ¯ ^ + h , κ ^ − h , σ ^ ) = 7559.08057432685 ℓ ( r ¯ ^ − h , κ ^ + h , σ ^ ) = 7559.08057432661 ℓ ( r ¯ ^ − h , κ ^ − h , σ ^ ) = 7559.08057369926 ,

(89) ℓ ( r ¯ ^ + h , κ ^ , σ ^ + h ) = 7559.08053065437 ℓ ( r ¯ ^ + h , κ ^ , σ ^ − h ) = 7559.08053091805 ℓ ( r ¯ ^ − h , κ ^ , σ ^ + h ) = 7559.08052869978 ℓ ( r ¯ ^ − h , κ ^ , σ ^ − h ) = 7559.08053287355

and

(90) ℓ ( r ¯ ^ , κ ^ + h , σ ^ + h ) = 7559.08053076554 ℓ ( r ¯ ^ , κ ^ + h , σ ^ − h ) = 7559.08053210167 ℓ ( r ¯ ^ , κ ^ − h , σ ^ + h ) = 7559.08052988274 ℓ ( r ¯ ^ , κ ^ − h , σ ^ − h ) = 7559.08053298488

we compute numerically the second order partial derivatives as

(91) ∂ 2 ℓ ∂ r ¯ 2 = 1 h 2 { ℓ ( r ¯ ^ + h , κ ^ , σ ^ ) − 2 ℓ ( r ¯ ^ , κ ^ , σ ^ ) + ℓ ( r ¯ ^ − h , κ ^ , σ ^ ) } = − 14453.7989399396 ∂ 2 ℓ ∂ r ¯ ∂ κ = 1 4 h 2 { ℓ ( r ¯ ^ + h , κ ^ + h , σ ^ ) − ℓ ( r ¯ ^ + h , κ ^ − h , σ ^ ) − ℓ ( r ¯ ^ − h , κ ^ + h , σ ^ ) + ℓ ( r ¯ ^ − h , κ ^ − h , σ ^ ) } = − 3138.30241793767 ∂ 2 ℓ ∂ r ¯ ∂ σ = 1 4 h 2 { ℓ ( r ¯ ^ + h , κ ^ , σ ^ + h ) − ℓ ( r ¯ ^ + h , κ ^ , σ ^ − h ) − ℓ ( r ¯ ^ − h , κ ^ , σ ^ + h ) + ℓ ( r ¯ ^ − h , κ ^ , σ ^ − h ) } = 9775.22404355113 ∂ 2 ℓ ∂ κ 2 = 1 h 2 { ℓ ( r ¯ ^ , κ ^ + h , σ ^ ) − 2 ℓ ( r ¯ ^ , κ ^ , σ ^ ) + ℓ ( r ¯ ^ , κ ^ − h , σ ^ ) } = − 1508.28782352619 ∂ 2 ℓ ∂ κ ∂ σ = 1 4 h 2 { ℓ ( r ¯ ^ , κ ^ + h , σ ^ + h ) − ℓ ( r ¯ ^ , κ ^ + h , σ ^ − h ) − ℓ ( r ¯ ^ , κ ^ − h , σ ^ + h ) + ℓ ( r ¯ ^ , κ ^ − h , σ ^ − h ) } = 4415.0260691822 ∂ 2 ℓ ∂ σ 2 = 1 h 2 { ℓ ( r ¯ ^ , κ ^ , σ ^ + h ) − 2 ℓ ( r ¯ ^ , κ ^ , σ ^ ) + ℓ ( r ¯ ^ , κ ^ , σ ^ − h ) } = − 866037.589730695

so that the Fisher Information Matrix becomes

I ≈ − ( − 14453.7989399396 − 3138.30241793767 9775.22404355113 − 3138.30241793767 − 1508.28782352619 4415.0260691822 9775.22404355113 4415.0260691822 − 866037.589730695 ) .

The covariance matrix COV ( r ¯ ^ , κ ^ , σ ^ ) of parameter estimates is the inverse of this matrix, namely

I − 1 ≈ ( 0.000126206691887228 − 0.000262343932416607 8.7113295284852 E − 08 − 0.000262343932416607 0.00121837741220894 3.25008677976887 E − 06 8.71132952848433 E − 08 3.25008677976891 E − 06 1.17223638081813 E − 06 )

and the standard errors emerge as square roots of the diagonal elements of this matrix

S E ( r ¯ ^ ) = 0.011234 , S E ( κ ^ ) = 0.034905 , S E ( σ ^ ) = 0.001083 ,

which are very close to those given for the CIR model in Table 1.

Corresponding author: Kevin Fergusson, Economics, University of Melbourne, Melbourne, Australia, E-mail: kevin.fergusson@unimelb.edu.au

Funding source: Australian Government Research Training Program Scholarship

Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: This research is supported by an Australian Government Research Training Program Scholarship.
Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

Ahn, D., and B. Gao. 1999. “A Parametric Nonlinear Model of Term Structure Dynamics.” Review of Financial Studies 12: 721–62, https://doi.org/10.1093/rfs/12.4.721.Suche in Google Scholar

Barczy, M., G. Pap, and T. T. Szabo. 2016. “Parameter Estimation for the Subcritical Heston Model Based on Discrete Time Observations.” Acta Scientiarum Mathematicarum (Szeged) 82: 313–38, https://doi.org/10.14232/actasm-015-016-0.Suche in Google Scholar

Bielecki, T., M. Jeanblanc, and M. Rutkowski. 2011. “Hedging of a Credit Default Swaption in the CIR Default Intensity Model.” Finance and Stochastics 15: 541–72, https://doi.org/10.1007/s00780-010-0143-7.Suche in Google Scholar

Biffis, E. 2005. “Affine Processes for Dynamic Mortality and Actuarial Valuations.” Insurance: Mathematics and Economics 37: 443–68, https://doi.org/10.1016/j.insmatheco.2005.05.003.Suche in Google Scholar

Black, F., and M. Scholes. 1973. “The Pricing of Options and Corporate Liabilities.” Journal of Political Economy 81: 637–54, https://doi.org/10.1086/260062.Suche in Google Scholar

Blackburn, F., and M. Sherris. 2013. “Consistent Dynamic Affine Mortality Models for Longevity Risk Applications.” Insurance: Mathematics and Economics 53: 64–73, https://doi.org/10.1016/j.insmatheco.2013.04.007.Suche in Google Scholar

Cox, J. C., J. E. Ingersoll, and S. A. Ross. 1985. “A Theory of the Term Structure of Interest Rates.” Econometrica 53: 385–407, https://doi.org/10.2307/1911242.Suche in Google Scholar

Dokuchaev, N. 2017. “A Pathwise Inference Method for the Parameters of Diffusion Terms.” Journal of Nonparametric Statistics 29: 731–43, https://doi.org/10.1080/10485252.2017.1367789.Suche in Google Scholar

Elliott, R., and E. Platen. 2001. Hidden Markov Chain Filtering for Generalised Bessel Processes, Trends in Mathematics, 123–43. Boston, MA: Birkhäuser. chapter 7.10.1007/978-1-4612-0167-0_7Suche in Google Scholar

Federal Reserve Bank of St. Louis. 2019. 6-Month London Interbank Offered Rate (LIBOR), Based on U.S. Dollar (USD6MTD156N), https://fred.stlouisfed.org/series/USD6MTD156N, [Online; accessed April 16 2019].Suche in Google Scholar

Federal Reserve Bank of St. Louis. 2020. Effective Federal Funds Rate [DFF]. https://fred.stlouisfed.org/series/DFF, [Online; accessed February 3 2020].Suche in Google Scholar

Feller, W. 1971. An Introduction to Probability Theory and its Applications, Vol 2, 2nd ed. New York: Wiley.Suche in Google Scholar

Fergusson, K., and E. Platen. 2015. “Application of Maximum Likelihood Estimation to Stochastic Short Rate Models.” Annals of Financial Economics 10: 1–26, https://doi.org/10.1142/s2010495215500098.Suche in Google Scholar

Goard, J., and M. Mazur. 2013. “Stochastic Volatility Models and the Pricing of VIX Options.” Mathematical Finance 23: 439–58, https://doi.org/10.1111/j.1467-9965.2011.00506.x.Suche in Google Scholar

Heston, S. L. 1997. A Simple New Formula for Options with Stochastic Volatility, Technical report: Washington University of St. Louis.Suche in Google Scholar

Huang, X. 2013. “Quasi-maximum Likelihood Estimation of Multivariate Diffusions.” Studies in Nonlinear Dynamics and Econometrics 17: 179–97, https://doi.org/10.1515/snde-2012-0026.Suche in Google Scholar

Karlin, S., and H. M. Taylor. 1981. A Second Course in Stochastic Processes. New York: Academic Press.Suche in Google Scholar

Le Cam, L. 1986. Asymptotic Methods in Statistical Decision Theory. New York: Springer-Verlag.10.1007/978-1-4612-4946-7Suche in Google Scholar

Lehmann, E. L., and G. Casella. 1998. Theory of Point Estimation. New York: Springer.Suche in Google Scholar

McCullagh, P. 1984. “Local Sufficiency.” Biometrika 71: 233–44, https://doi.org/10.1093/biomet/71.2.233.Suche in Google Scholar

Overbeck, L., and T. Rydén. 1997. “Estimation in the Cox-Ingersoll-Ross Model.” Econometric Theory 13: 430–61, https://doi.org/10.1017/s0266466600005880.Suche in Google Scholar

Platen, E. 1997. A Non-linear Stochastic Volatility Model, Working Paper: Australian National University, Financial Mathematics Research Reports. FMRR 005-97.Suche in Google Scholar

Platen, E. 1999. “A Short-Term Interest Rate Model.” Finance and Stochastics 3: 215–25, https://doi.org/10.1007/s007800050059.Suche in Google Scholar

Poulsen, R. 1999. Approximate Maximum Likelihood Estimation of Discretely Observed Diffusion Processes, Working Paper No. 29: University of Aarhus.Suche in Google Scholar

Shiller, R. 2019. One-Year Interest Rate, http://www.econ.yale.edu/ shiller/data.htm, [Online; accessed April 16 2019].Suche in Google Scholar

Watson, G. N. 1966. A Treatise on the Theory of Bessel Functions, 2nd ed.: Cambridge University Press.Suche in Google Scholar

Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/snde-2019-0079).

Received: 2019-07-25

Accepted: 2020-06-30

Published Online: 2020-08-12

This work is licensed under the Creative Commons Attribution 4.0 International License.

Supplementary Material Details

Artikel in diesem Heft

https://doi.org/10.1515/snde-2019-0079

Schlagwörter für diesen Artikel

Bessel process; Cox-Ingersoll-Ross model; maximum likelihood estimation; squared Bessel process

Creative Commons

BY 4.0

Fast maximum likelihood estimation of parameters for square root and Bessel processes

Artikel

Abstract

1 Introduction

2 Probabilistic framework

3 Main theorem and corollary

Definition 1:

Remark 1:

Definition 2:

Theorem 1:

Proof:

Theorem 2:

Proof:

Corollary 1:

4 Applications

4.1 Estimation from simulated data

4.1.1 CIR model

4.1.2 3/2 model

4.1.3 Bessel model

4.2 Estimation from historical interest rates

4.3 Comparison of run times

5 Conclusion

A Proofs of theorems 1 and 2

Proof:

Proof:

B Calculations of parameter standard errors in Table 1

References

Supplementary Material

Zusatzmaterial

Artikel in diesem Heft

Artikel in diesem Heft

Artikel in diesem Heft