Bayesian Reconciliation of Return Predictability

Borys Koval; Sylvia Frühwirth-Schnatter; Leopold Sögner

doi:10.1515/snde-2022-0110

Article

Bayesian Reconciliation of Return Predictability

Borys Koval , Sylvia Frühwirth-Schnatter and Leopold Sögner

Published/Copyright: December 25, 2023

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Studies in Nonlinear Dynamics & Econometrics Volume 28 Issue 2

Abstract

This article considers a stable vector autoregressive (VAR) model and investigates return predictability in a Bayesian context. The bivariate VAR system comprises asset returns and a further prediction variable, such as the dividend-price ratio, and allows pinning down the question of return predictability to the value of one particular model parameter. We develop a new shrinkage type prior for this parameter and compare our Bayesian approach to ordinary least squares estimation and to the reduced-bias estimator proposed in Amihud and Hurvich (2004. “Predictive Regressions: A Reduced-Bias Estimation Method.” Journal of Financial and Quantitative Analysis 39: 813–41). A simulation study shows that the Bayesian approach dominates the reduced-bias estimator in terms of observed size (false positive) and power (false negative). We apply our methodology to a system comprising annual CRSP value-weighted returns running, respectively, from 1926 to 2004 and from 1953 to 2021, and the logarithmic dividend-price ratio. For the first sample, the Bayesian approach supports the hypothesis of no return predictability, while for the second data set weak evidence for predictability is observed. Then, instead of the dividend-price ratio, some prediction variables proposed in Welch and Goyal (2008. “A Comprehensive Look at the Empirical Performance of Equity Premium Prediction.” Review of Financial Studies 21: 1455–508) are used. Also with these prediction variables, only weak evidence for return predictability is supported by Bayesian testing. These results are corroborated with an out-of-sample forecasting analysis.

Keywords: VAR; return predictability; Bayesian control function approach; shrinkage priors; Bayes Factor

JEL Classification: C11; C58; G12

Corresponding author: Borys Koval, Vienna Graduate School of Finance, WU Vienna University of Economics and Business, 1020 Vienna, Austria; and Department of Economics and Finance, Institute for Advanced Studies, 1080 Vienna, Austria, E-mail: borys.koval@wu.ac.at

Funding source: Austrian Science Fund

Award Identifier / Grant number: DOC23-G16 VGSF

Acknowledgment

We thank Pasquale Della Corte, Luis Gruber, Sylvia Kaufmann, Gregor Kastner, John Cochrane, Darjus Hosszejni, participants at the VGSF Conference 2019, the CFE 2021 conference, and ESOBE Conference 2022 for helpful comments. Leopold Sögner acknowledges support by the Cost Action HiTEc – CA21163. We further express our gratitude to the Associate Editor and the Referee for their helpful comments and detailed suggestions that contributed to the quality of the paper.

Conflict of interest: The authors declare no conflicts of interest regarding this article.
Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.
Research funding: Borys Koval acknowledges financial support from the Austrian Science Fund (FWF), grant number DOC23-G16 VGSF.

Appendix A: Priors

A.1 Explicit Prior on R ²

We specify a direct prior on R ² similar to Giannone, Lenza, and Primiceri (2021) and Zhang et al. (2020). Giannone, Lenza, and Primiceri (2021) operate in a regression model with k standardized covariates, i.e. E x j t = 0 and V x j t = 1 , and impose the following prior independently for each regression coefficient β j ⋆ in the standardized model:

β j ⋆ | σ y 2 ∼ N 0 , σ y 2 Σ β .

Under the additional assumption that the covariates x _jt are uncorrelated, Giannone, Lenza, and Primiceri (2021) define R ² unconditional on β:

(29) R 2 = k • Σ β k • Σ β + 1 ,

where k is the dimension of the regression parameter. The Beta distribution R 2 ∼ B a 0 R , b 0 R is the natural candidate for the prior on R ² since the support is [0,1] and the Beta distribution is flexible enough to model different prior beliefs regarding R ². The parameters a 0 R and b 0 R determine the shape of the prior distribution on R ². For illustration, Figure 6 presents the density plots for different values of a 0 R and b 0 R .

$Figure 6: Priors on R 2. The prior on R 2 is defined in Equation (30) with different values for a 0 R ${a}_{0}^{R}$ and b 0 R ${b}_{0}^{R}$ being fixed to one. The first prior (red) is a flat prior with a 0 R = 1 ${a}_{0}^{R}=1$ and b 0 R = 1 ${b}_{0}^{R}=1$ . The second prior (green) has a 0 R = 0.5 ${a}_{0}^{R}=0.5$ and b 0 R = 1 ${b}_{0}^{R}=1$ . The third prior (cyan) has a 0 R = 0.1 ${a}_{0}^{R}=0.1$ and b 0 R = 1 ${b}_{0}^{R}=1$ . The fourth prior (purple) has a hyperprior on a 0 R ${a}_{0}^{R}$ with a 0 R = a ̄ 0 R = 0.5 ${a}_{0}^{R}={\bar{a}}_{0}^{R}=0.5$ with probability p a 0 R = 0.5 ${p}_{{a}_{0}^{R}}=0.5$ and a 0 R = a ̲ 0 R = 0.1 ${a}_{0}^{R}={\underline{a}}_{0}^{R}=0.1$ with probability 1 − p a 0 R = 0.5 $1-{p}_{{a}_{0}^{R}}=0.5$ discussed in Section 3.1.$

Figure 6:

Priors on R ². The prior on R ² is defined in Equation (30) with different values for a 0 R and b 0 R being fixed to one. The first prior (red) is a flat prior with a 0 R = 1 and b 0 R = 1 . The second prior (green) has a 0 R = 0.5 and b 0 R = 1 . The third prior (cyan) has a 0 R = 0.1 and b 0 R = 1 . The fourth prior (purple) has a hyperprior on a 0 R with a 0 R = a ̄ 0 R = 0.5 with probability p a 0 R = 0.5 and a 0 R = a ̲ 0 R = 0.1 with probability 1 − p a 0 R = 0.5 discussed in Section 3.1.

Imposing a Beta prior on R ² yields:

(30) R 2 = k Σ β k Σ β + 1 ∼ B a 0 R , b 0 R , a 0 R > 0 , b 0 R > 0 .

Using kΣ_β = R ²/(1 − R ²) and the relationship between the Beta distribution and the Beta prime distribution (see, e.g. Johnson, Kotz, and Balakrishnan 1995), we obtain the following hierarchical prior for the regression coefficients β j ⋆ in a standardized model:

β j ⋆ | σ y 2 , Σ β ∼ N 0 , σ y 2 k k Σ β , k Σ β ∼ B P a 0 R , b 0 R .

Using Cadonna, Frühwirth-Schnatter, and Knaus (2020, Lemma 1), it can be shown that this prior is equivalent to a triple gamma prior, also known as a normal-gamma-gamma prior (Griffin and Brown 2017), where we define Σ R = k b 0 R a 0 R Σ β :

(31) β j ⋆ | σ y 2 , Σ R ∼ N 0 , σ y 2 a 0 R k b 0 R Σ R , Σ R ∼ F 2 a 0 R , 2 b 0 R .

For a given level of predictability, as expressed through the hyperparameters a 0 R and b 0 R of the Beta prior for R ², the variance of the prior defined in (31) is inversely proportional to the number k of (independent) covariates. The more covariates are present, the tighter this prior is to ensure the same level of predictability.

A.1.1 Application to the Predictive System (1)

To apply this prior to our predictive system in Equation (1), where k = 1, we first transform it such that E [ x t ] = 0 and V [ x t ] = 1 is standardized:

y t = ( α y ) ⋆ + β ⋆ x t − 1 − α x / ( 1 − ϕ ) σ x x 2 + ϵ t y , ϵ t y ∼ N 0 , σ y 2 ,

where σ x x 2 ≔ σ x 2 / ( 1 − ϕ 2 ) is equal to the unconditional variance of the covariate x _t, where x t t ∈ Z is following the AR(1)-process in Equation (1). Matching priors yields the following prior for β = β ⋆ / σ x x 2 :

(32) β | ϕ , σ x 2 , σ y 2 , Σ β ∼ N 0 , Σ β σ y 2 ( 1 − ϕ 2 ) σ x 2 , Σ β ∼ B P a 0 R , b 0 R .

This leads to an immediate extension of the prior of Wachter and Warusawitharana (2015) by allowing Σ_β (in Wachter and Warusawitharana (2015) notation this is σ η 2 ) to be a random variable following the beta-prime distribution Σ β ∼ B P a 0 R , b 0 R instead of assuming a fixed value.^[8] Since we operate in model (5), we express σ y 2 / σ x 2 in terms of the parameters ψ , σ ̃ y 2 , σ x 2 and define the prior variance for β as:

(33) Σ 0 β ψ , σ ̃ y 2 , σ x 2 , ϕ , Σ β = Σ β σ y 2 ( 1 − ϕ 2 ) σ x 2 = Σ β ( 1 − ϕ 2 ) σ ̃ y 2 σ x 2 + ψ 2 .

This induces the prior on β discussed in Section 3.1.

Appendix B: MCMC Details

B.1 Details for Sampling (α ^x, ϕ)

In this section, we discuss the sampling procedure for θ 1 | α y , β , ψ , σ x 2 , σ ̃ y 2 , D T , where θ ₁≔(α ^x, ϕ). For a comprehensive review please refer to Kastner and Frühwirth-Schnatter (2014). We consider the centered parameterization, namely Equation (1), and apply an independence Metropolis-Hastings algorithm to sample θ ₁. To increase the acceptance rate, we employ an auxiliary regression model with the conjugate prior

p aux θ 1 | σ ̃ x 2 ∼ N 0 2 , σ ̃ x 2 B 0

as auxiliary prior for θ ₁ with B ₀ = diag(10¹², 10⁸) and σ ̃ x 2 defined as in (19).

This auxiliary prior allows us to derive the auxiliary conditional posterior distribution θ 1 | α y , β , ψ , σ x 2 , σ ̃ y 2 , D T , given by:

(34) θ 1 | α y , β , ψ , σ x 2 , σ ̃ y 2 , D T ∼ N b T , σ ̃ x 2 B T ,

with B T = Z ⊤ Z + B 0 − 1 − 1 and b T = B T Z ⊤ ( x − u ̃ ) + B 0 − 1 b 0 , where Z is a T × 2 matrix with the t-th row given by (1 x _t−1) and u ̃ ≔ ( u ̃ t ) t = 1 , … , T , where u ̃ t is defined in Equation (19). Using the auxiliary posterior (34) as the proposal in an independence MH algorithm, a new value θ 1 n e w is proposed and accepted with probability min(1, A), where

(35) A = p x 0 | θ 1 n e w , σ x 2 p x 0 | θ 1 , σ x 2 × p β | Σ 0 β ( ϕ n e w , • ) p β | Σ 0 β ( ϕ , • ) × p ( θ 1 n e w ) p ( θ 1 ) × p aux ( θ 1 ) p aux ( θ 1 n e w ) .

The first term in Equation (35) is the ratio of densities for the initial observation x ₀ of the AR process in Equation (5), the second term is the ratio of priors for β since the prior β | ϕ ∼ N 0 , Σ 0 β ( ϕ , • ) depends on ϕ and serves as an additional likelihood term which has to be added to the acceptance rate, the third term is the ratio of priors that depends on the choice of the prior discussed in Section 3.1 and the fourth term in Equation (35) is the correction term that corresponds to the auxiliary priors employed to derive the auxiliary distribution θ 1 | α y , β , ψ , σ x 2 , σ ̃ y 2 , D T in (34). The expression for log(A) becomes

(36) log ( A ) = 1 2 log 1 − ( ϕ n e w ) 2 1 − ϕ 2 + 1 2 σ x 2 ( x 0 − μ ) 2 ( 1 − ϕ 2 ) − ( x 0 − μ n e w ) 2 ( 1 − ( ϕ n e w ) 2 ) + log 1 − ϕ 1 − ϕ n e w + 1 2 Σ 0 α x ( μ − μ 0 α x ) 2 − ( μ n e w − μ 0 α x ) 2 + 1 2 log 1 − ϕ 2 1 − ( ϕ n e w ) 2 + 1 2 σ ̃ x 2 θ 1 n e w ′ B 0 − 1 θ 1 n e w − θ 1 ′ B 0 − 1 θ 1 ,

where we define μ≔α ^x/(1 − ϕ) and μ n e w ≔ ( α x ) n e w / ( 1 − ϕ n e w ) and log(A) stands for the natural logarithm of A.

B.2 Sampling Details for Σ_β

Given Σ_β, we first sample from the posterior distribution of Z _β|Σ_β. Combing the prior distribution for Z _β,

p ( Z β ) ∼ G ( a 0 R , 1 ) ∝ Z β a 0 R − 1 ⁡ exp ( − Z β ) ,

with the prior distribution for Σ_β conditional on Z _β,

p ( Σ β | Z β ) ∼ I G ( b 0 R , Z β ) ∝ 1 Σ β b 0 R + 1 × exp − Z β Σ β × Z β b 0 R ,

the posterior distribution of Z _β|Σ_β is given by

p ( Z β | Σ β ) ∝ p ( Σ β | Z β ) p ( Z β ) ∝ Z β b 0 R ⁡ exp − Z β Σ β Z β a 0 R − 1 ⁡ exp ( − Z β ) ∝ Z β b 0 R + a 0 R − 1 ⁡ exp − Z β 1 + 1 Σ β ∼ G a 0 R + b 0 R , 1 + 1 Σ β .

Next, we sample from the posterior distribution of Σ_β, combining the prior distribution for β with the prior distribution of Σ_β conditional on Z _β:

p β | Σ β , ϕ , σ x 2 , σ ̃ y 2 , ψ ∼ N 0 , Σ β ( 1 − ϕ 2 ) σ ̃ y 2 σ x 2 + ψ 2 , p ( Σ β | Z β ) ∼ I G ( b 0 R , Z β ) ∝ 1 Σ β b 0 R + 1 × exp − Z β Σ β .

Hence, the posterior for Σ_β is given by:

p ( Σ β | • ) ∝ p β | Σ β , ϕ , σ x 2 , σ ̃ y 2 , ψ × p ( Σ β | Z β ) ∝ 1 Σ β b 0 R + 1 2 + 1 × exp − 1 Σ β β 2 2 ( 1 − ϕ 2 ) σ ̃ y 2 σ x 2 + ψ 2 − 1 + Z β ,

which results in

p ( Σ β | • ) ∼ I G b 0 R + 1 2 , Z β + S β , where S β = β 2 2 ( 1 − ϕ 2 ) σ ̃ y 2 σ x 2 + ψ 2 − 1 .

B.3 Sampling of a 0 R

In this section, we discuss the sampling procedure for p ( a 0 R | Σ β ) , where we apply a MH step with acceptance rate min{1, A}, where

A = p ( a 0 R ) n e w | Σ β p a 0 R | Σ β × q a 0 R | ( a 0 R ) n e w q ( a 0 R ) n e w | a 0 R .

The first term is the posterior ratio and the last term is the ratio of the proposal densities. We choose a symmetric proposal density q a 0 R | ( a 0 R ) n e w = q ( a 0 R ) n e w | a 0 R , where the proposal ratio cancels. For the first term, we can write

p ( a 0 R ) n e w | Σ β p a 0 R | Σ β = p Σ β | ( a 0 R ) n e w p ( a 0 R ) n e w p Σ β | a 0 R p a 0 R ,

where p Σ β | a 0 R is the density of a Beta prime distribution:

(37) p Σ β | a 0 R = 1 B ( a 0 R , b 0 R ) ( Σ β ) a 0 R − 1 1 + Σ β − ( a 0 R + b 0 R ) .

This follows from the fact that only the prior of Σ_β depends on a 0 R , while the remaining parameters are independent of a 0 R , given Σ_β. Given a binary prior distribution for a 0 R with two values with equal probability, we obtain the expression

(38) log ( A ) = ( a 0 R ) n e w − a 0 R log ( Σ β ) − log ( 1 + Σ β ) + log Γ a 0 R − log Γ a 0 R + b 0 R + log Γ ( a 0 R ) n e w + b 0 R − log Γ ( a 0 R ) n e w ,

and Γ z denotes the Euler’s Gamma function.

In case we sample b 0 R in addition to a 0 R , we need to update the sampler slightly. The acceptance rate reads:

A = p ( a 0 R ) n e w , ( b 0 R ) n e w | Σ β p a 0 R , b 0 R | Σ β × q a 0 R , b 0 R | ( a 0 R ) n e w , ( b 0 R ) n e w q ( a 0 R ) n e w , ( b 0 R ) n e w | a 0 R , b 0 R ,

where the first term is the posterior ratio and the second term is the ratio of the proposal densities. We choose a symmetric proposal density q a 0 R , b 0 R | ( a 0 R ) n e w , ( b 0 R ) n e w = q ( a 0 R ) n e w , ( b 0 R ) n e w | a 0 R , b 0 R , where the proposal ratio cancels. For the first term, we can write

p ( a 0 R ) n e w , ( b 0 R ) n e w | Σ β p a 0 R , b 0 R | Σ β = p Σ β | ( a 0 R ) n e w , ( b 0 R ) n e w p ( a 0 R ) n e w , ( b 0 R ) n e w p Σ β | a 0 R , b 0 R p a 0 R , b 0 R ,

where p Σ β | a 0 R , b 0 R is the density of the Beta prime distribution defined in (37), since only the prior of Σ_β depends on a 0 R and b 0 R , while the remaining parameters are independent of a 0 R and b 0 R , given Σ_β. Given a binary prior distribution for a 0 R and b 0 R with two values with equal probability, we obtain the expression

(39) log A = log B a 0 R , b 0 R − log B a 0 R n e w , b 0 R n e w + ( a 0 R ) n e w − a 0 R log ( Σ β ) + ( a 0 R − ( a 0 R ) n e w ) + ( b 0 R − ( b 0 R ) n e w ) log ( 1 + Σ β ) .

B.4 Convergence and Mixing Diagnostics for Simulated Data

To ensure satisfactory performance of the sampler, we investigate the convergence and mixing properties for the Markov chains obtained by means of our Bayesian sampler. Consider the M thinned posterior draws ξ i ( m ) , m = 1, …, M, for selected parameters of interest ξ such as ψ, ϕ and β for each data set D T , i , i = 1, …, n, on a separate basis.

Convergence of the Chain: To asses the convergence we rely on the coda package^[9] in R, that computes the Geweke (1992) convergence diagnostic for each Markov chain following from Algorithm 1 applied to the data sets D T , i , i = 1, …, n. The procedure tests the null hypothesis of equal means in the first and in the last part of a Markov chain (by default the first 10 % and the last 50 %). The asymptotic distribution of the test statistic obtained in Geweke (1992), denoted G ̂ i conv in the following, is a standard normal distribution. For DGP 0 and the parameters ψ, ϕ and β, we observe that for 93.3 %, 93.2 % and 94.0 % of the n Markov chains the null hypotheses are not rejected at the 5 % significance level. For DGP 1 the corresponding numbers are 95.2 %, 91.9 % and 91.8 %. The results indicate that the majority of chains converge to the stationary distribution.

Mixing of the Chain: Regarding mixing of the chain, by following Gelman et al. (1995) [Chapter 11.5] the Effective sample size (ESS) is obtained for each data set D T , i by means of

M i eff ≔ M 1 + 2 ∑ ℓ = 1 ∞ ρ i , ℓ ,

where is ρ _i,ℓ is the autocorrelation of ξ i ( m ) at lag ℓ. To estimate the “long run correlation term” in the denominator we use the coda package in R, which relies on estimating the spectral density at frequency zero. This results in the estimates M ̂ i eff , i = 1, …, n.

We exclude those n − n _d data sets D T , i where either M ̂ i eff is less than a third of the actual sample size after burn-in and thinning, denoted by M in the main text, or the Geweke (1992) convergence diagnostic rejects the null hypothesis of equal means at the 5 % significance level. Overall we confirm that convergence and mixing is satisfactory and we have reached the stationary distribution of the Markov chain considered.

Appendix C: Bayesian Testing

C.1 An Example on Bayes Factors

We illustrate the idea of the F P ̂ and F N ̂ measures that are based on the Bayes Factor in Figure 7. The blue line is the hypothetical distribution of the Bayes Factor under DGP 1 and the red line is the hypothetical distribution of the Bayes Factor under DGP 0. Given the value BF ₀₁ and the threshold K we have a decision rule on whether we have predictability or not. If BF ₀₁ < (≥) K = 1 we decide that we have (no) predictability because the posterior distribution is smaller (larger) than the prior distribution in Equation (26). The filled blue and red areas indicate the F N ̂ and F P ̂ rates respectively. For example, the integral of the red area shows the probability of observing BF ₀₁ < K and deciding that we have predictability given that we have DGP 0 (False Positive rate). Alternatively, the integral of the blue area shows the probability of observing BF ₀₁ > K and deciding that we have no predictability given that we have DGP 1 (False Negative rate). These two measures are especially important since our ultimate goal is to test whether the dividend-price ratio predicts future returns (or not). This means that for real data we might not be interested in the value of the coefficient in the predictive regression per se, but rather if it is statistically significant or not.

Figure 7:

Hypothetical distribution of the Bayes Factors under DGP 0 (red line) and DGP 1 (blue line). The vertical line indicates the testing threshold K = 1. The red area represents the size of the test and the blue area represents the power of the test.

C.2 Computation of Bayes Factors via the Savage–Dickey Density Ratio

Since the prior density p(β) defined in Section 3.1 has no closed form we use the hierarchical representation,

p ( β | ψ ) ∼ N 0 , g ( ψ ) , where ψ ∼ p ( ψ ) ,

and g( ψ ) is a non-linear function of the subvector of model parameters ψ , which contains all parameters we condition on in sampling Step (1) of Algorithm 1. The prior p(β) can then be approximated by

p ( β ) = ∫ p ( β | ψ ) p ( ψ ) d ψ = lim M → ∞ 1 M ∑ m = 1 M p N β ; 0 , g ( ψ ( m ) ) ,

where p N ( β ; 0 , S ) is a Gaussian density function with mean 0 and variance S. Let ψ ^(m) ∼ p( ψ ) denote independent draws from p( ψ ). To approximate the prior ordinate p(β = 0), we evaluate the densities at p N 0 ; 0 , g ( ψ ( m ) ) and obtain

(40) p ( β = 0 ) = lim M → ∞ 1 M 2 π ∑ m = 1 M 1 g ( ψ ( m ) ) .

The posterior density of p ( β | D T ) has the following representation

b _T is the second element of μ _T, and B _T is the second diagonal element of Σ _T defined in Equation (16). Therefore p ( β | D T ) can be approximated by

p ( β | D T ) = lim M → ∞ 1 M ∑ m = 1 M p N β ; b T ( m ) , B T ( m ) ,

where b T ( m ) and B T ( m ) are the conditional posterior moments, when sampling β ^(m) at the mth iteration of the sampler. To approximate the posterior ordinate p ( β = 0 | D T ) , we evaluate the densities at p N ( 0 ; b T ( m ) , B T ( m ) ) and obtain

(41) p ( β = 0 | D T ) = lim M → ∞ 1 M 2 π ∑ m = 1 M 1 B T ( m ) exp − ( b T ( m ) ) 2 2 B T ( m )

By combining (40) and (41), we could estimate BF ₀₁ in (26) by the finite sample estimators of prior and the posterior ordinate at 0:

B F ̂ 01 = 1 M ∑ m = 1 M 1 B T ( m ) exp − b T ( m ) 2 2 B T ( m ) 1 M ∑ m = 1 M 1 g ψ ( m ) .

However, given that the distribution might be right-skewed we use the median estimator instead of the mean [the posterior median minimizes the error under the L ₁ loss function]. This results in the following estimator of BF ₀₁:

(42) B F ̂ 01 = Q 0.5 , M 1 B T ( m ) exp − b T ( m ) 2 2 B T ( m ) Q 0.5 , M 1 g ψ ( m ) ,

where Q _0.5,M(ξ ^(m)) denotes the median of the M draws of ξ ^(m).

C.3 Simulation Results on Bayes Factors

We analyze the result for F P ̂ and F N ̂ in more detail in Figure 8 that shows the sampling distribution of the Bayes Factor for DGP 0 and DGP 1 similar to the example in Figure 7. The vertical line at 1 indicates the threshold K above which we conclude that there is no predictability. In Figure 8 we analyze the sampling distribution of the t-values for DGP 0 and DGP 1 for OLS and RBE. The vertical lines (equal to −1.96 and 1.96) indicate the standard thresholds when we reject/non-reject the null hypothesis of no predictability at the 5 % significance level. The filled blue area indicates the F N ̂ rate that shows the situation when we conclude that there is no predictability whereas we do have one. The red filled area represents the F P ̂ rate, meaning that we conclude that there is predictability, whereas we do not have one. For OLS we have violations for the right tail of the sampling distribution under DGP 0, meaning that we detect predictability despite a true value of β = 0. For RBE we decrease the bias by shifting the sampling distribution of β ̂ RBE close to zero. This decreases most of the t-values, but t-values smaller than −1.96 contribute to the F P ̂ rate. Under Bayesian testing, we observe that the sampling distribution of the Bayes Factor for DGP 0 is rather flat, whereas for DGP 1 it concentrates at values close to zero.

Figure 8:

Sampling distribution of the Bayes Factor for the parameter β for the Bayesian estimator (upper panel) and sampling distribution for the t-values for the frequentist estimators OLS (lower left panel) and RBE (lower right panel). The left panel is the result for DGP 0 and the right panel is the result for DGP 1.

Appendix D: Robustness Checks

In this section, we perform various robustness checks regarding the results in Section 4. We consider additional simulation settings by varying the true value of β in DGP 1 and by varying the sample size T. Furthermore, we check robustness regarding the B a 0 R , b 0 R -prior for R ² defined in (12). Given the same prior choices for a 0 R as in Section 4, we consider b 0 R = 0.5 in addition to b 0 R = 1 . In addition, following the suggestion of a reviewer, we vary a 0 R and b 0 R such the prior expectation of R ² is kept at a fixed level.

D.1 Different Values for β and b 0 R

We analyze additional cases where β ∈ {0, 0.025, 0.05, 0.075, 0.1, 0.2} and denote the corresponding simulation settings as DGP _i for i ∈ {0, …, 5} respectively. Additionally, given the same prior choices for a 0 R as in Section 4, we analyze the case where b 0 R equals 1 or 0.5. We compare the performance of the various methods with the results presented in Table 2. We investigate how the performance of the methods changes, when we change the level of predictability. We expect that for smaller values of β it would be harder and for larger values of β easier to reject the null hypothesis. The results presented in Table 6 for b 0 R = 1 and in Table 7 for b 0 R = 0.5 confirm these expectations. The estimated false negative rates (observed power) are monotonically increasing in β. The observed power is high for OLS. For larger values of β the difference between the Bayesian estimator and OLS decreases and for β = 0.2 we have comparable observed power for the Bayesian estimator and OLS. In terms of MAE and RMSE, the results are stable for OLS and RBE, while deteriorating for the Bayesian estimator. This follows from the choice of a prior that puts a lot of mass on small values of β, affecting the posterior.

Table 6:

Measures of the estimation quality for β ̂ • (multiplied by 100): bias B ( β ̂ • , β ) , sample standard deviation of the point estimates σ ( β ̂ • ) , MAE ( β ̂ • , β ) , RMSE ( β ̂ • , β ) , estimated false positive ( F P ̂ ) and estimated false negative rates ( F N ̂ ) . Three estimation methods: OLS, RBE and BAY. We have 6 DGPs where β ∈ {0, 0.025, 0.05, 0.075, 0.1, 0.2} and b 0 R = 1 .

Method	B ( β ̂ • , β )	σ ( β ̂ • )	MAE ( β ̂ • , β )	RMSE ( β ̂ • , β )	–
β = 0					F P ̂
OLS	4.33	6.56	5.77	7.86	8.14
RBE	0.47	6.69	5.10	6.71	7.21
BAY: a 0 R random	1.94	3.98	2.55	4.42	6.10
BAY: a 0 R = 0.1	1.31	2.72	1.60	3.01	7.27
BAY: a 0 R = 0.5	3.31	5.10	4.18	6.08	4.59
β = 0.025					F N ̂
OLS	4.09	6.39	5.50	7.58	86.10
RBE	0.23	6.52	4.94	6.52	94.40
BAY: a 0 R random	0.79	4.55	2.84	4.61	89.04
BAY: a 0 R = 0.1	−0.20	3.76	2.44	3.76	86.23
BAY: a 0 R = 0.5	2.57	5.34	4.13	5.93	91.50
β = 0.05					F N ̂
OLS	4.29	6.63	5.77	7.89	71.5
RBE	0.4	6.78	5.17	6.78	89.7
BAY: a 0 R random	0.09	5.50	3.91	5.49	78.74
BAY: a 0 R = 0.1	−1.36	4.86	3.82	5.04	74.27
BAY: a 0 R = 0.5	1.75	5.82	4.19	6.07	85.99
β = 0.075					F N ̂
OLS	4.64	6.38	5.92	7.88	48.40
RBE	0.75	6.50	4.91	6.54	80.00
BAY: a 0 R random	0.10	5.97	4.50	5.97	58.30
BAY: a 0 R = 0.1	−2.40	5.07	4.83	5.60	57.49
BAY: a 0 R = 0.5	1.96	6.24	4.72	6.53	70.22
β = 0.1					F N ̂
OLS	4.46	6.67	5.89	8.02	28.30
RBE	0.60	6.80	5.20	6.83	63.27
BAY: a 0 R random	0.43	6.97	5.26	6.98	35.75
BAY: a 0 R = 0.1	−2.11	6.74	5.83	7.06	35.03
BAY: a 0 R = 0.5	1.67	6.25	4.86	6.47	52.93
β = 0.2					F N ̂
OLS	4.24	6.63	5.74	7.86	0.2
RBE	0.36	6.74	5.18	6.75	3.8
BAY: a 0 R random	1.95	6.63	5.12	6.91	0.22
BAY: a 0 R = 0.1	1.25	6.61	5.05	6.72	0.57
BAY: a 0 R = 0.5	2.73	6.39	5.18	6.95	1.38

Table 7:

Method	B ( β ̂ • , β )	σ ( β ̂ • )	MAE ( β ̂ • , β )	RMSE ( β ̂ • , β )	–
β = 0					F P ̂
OLS	4.33	6.56	5.77	7.86	8.14
RBE	0.47	6.69	5.10	6.71	7.21
BAY: a 0 R random	2.06	4.37	2.67	4.83	4.58
BAY: a 0 R = 0.1	1.59	3.50	1.90	3.84	7.50
BAY: a 0 R = 0.5	3.10	4.94	4.07	5.83	2.02
β = 0.025					F N ̂
OLS	4.09	6.39	5.50	7.58	86.10
RBE	0.23	6.52	4.94	6.52	94.40
BAY: a 0 R random	1.06	4.81	3.06	4.92	91.79
BAY: a 0 R = 0.1	0.09	4.19	2.53	4.19	88.18
BAY: a 0 R = 0.5	2.66	5.50	4.09	6.11	94.98
β = 0.05					F N ̂
OLS	4.29	6.63	5.77	7.89	71.5
RBE	0.4	6.78	5.17	6.78	89.7
BAY: a 0 R random	0.50	5.69	4.14	5.71	80.46
BAY: a 0 R = 0.1	−0.94	5.09	3.85	5.17	74.78
BAY: a 0 R = 0.5	1.90	5.73	4.32	6.03	91.40
β = 0.075					F N ̂
OLS	4.64	6.38	5.92	7.88	48.40
RBE	0.75	6.50	4.91	6.54	80.00
BAY: a 0 R random	0.39	6.23	4.74	6.24	66.12
BAY: a 0 R = 0.1	−1.55	5.80	4.90	6.00	57.33
BAY: a 0 R = 0.5	2.03	6.53	4.86	6.84	79.26
β = 0.1					F N ̂
OLS	4.46	6.67	5.89	8.02	28.30
RBE	0.60	6.80	5.20	6.83	63.27
BAY: a 0 R random	1.13	7.14	5.54	7.23	43.01
BAY: a 0 R = 0.1	−0.90	6.79	5.66	6.85	32.18
BAY: a 0 R = 0.5	1.98	6.20	4.69	6.51	63.15
β = 0.2					F N ̂
OLS	4.24	6.63	5.74	7.86	0.2
RBE	0.36	6.74	5.18	6.75	3.8
BAY: a 0 R random	2.29	6.75	5.33	7.12	1.81
BAY: a 0 R = 0.1	1.78	6.61	5.20	6.85	0.43
BAY: a 0 R = 0.5	2.83	6.28	5.05	6.88	2.87

D.2 Holding the Prior Expectation of R ² Fixed

The mixture Beta prior on R ² imposed in Section 4, where b 0 R is fixed and a 0 R is Bernoulli distributed, follows the idea that we are confronted with two regimes regarding predictability, where R ² is relatively high in one regime compared to a second regime where predictability is low. In more detail,

(43) E R 2 | a 0 R , b 0 R = a 0 R a 0 R + b 0 R = 0.09 , if a 0 R = 0.1 and b 0 R = 1 , 0.33 , if a 0 R = 0.5 and b 0 R = 1 ,

while the median, denoted Q _0.5, lies to the left of the mean (since the Beta distribution is right-skewed) and is given by:

(44) Q 0.5 R 2 | a 0 R , b 0 R = I 0.5 − 1 a 0 R , b 0 R = 0.001 , if a 0 R = 0.1 and b 0 R = 1 , 0.25 , if a 0 R = 0.5 and b 0 R = 1 ,

where I q a 0 R , b 0 R denotes the regularized incomplete Beta function. For the case a 0 R > 0 and b 0 R = 1 the median is available in closed form, in which case I 0.5 − 1 a 0 R , 1 = 2 − 1 a 0 R (see, e.g. Gupta and Nadarajah 2004, page 29). The conditional variance of R ² (based on the Beta distribution) is

V R 2 | a 0 R , b 0 R = a 0 R b 0 R ( a 0 R + b 0 R ) 2 ( a 0 R + b 0 R + 1 ) = 0.039 , if a 0 R = 0.1 and b 0 R = 1 , 0.089 , if a 0 R = 0.5 and b 0 R = 1 .

Our prior specification combines two priors. The first prior, a 0 R = 0.1 and b 0 R = 1 , is an informative prior that puts a lot of mass on values close to zero. The second prior, a 0 R = 0.5 and b 0 R = 1 , is more uninformative, while still allocating mass at zero.

Alternatively, we try to match the empirical evidence that R ² ≈ 0.05 and keep the prior expectation fixed at that level. In this case, we chose R 2 ∼ B a 0 R , b 0 R , where ( a 0 R , b 0 R ) = ( 0.1 , 1.9 ) with probability p a 0 R , b 0 R = 0.5 and ( a 0 R , b 0 R ) = ( 1,19 ) with probability 1 − p a 0 R , b 0 R . Table 8 presents the results with this alternative prior specification. The results are slightly worse than the results obtained with the prior applied in Section 4, that is R 2 ∼ B a 0 R , b 0 R where ( a 0 R , b 0 R ) = ( 0.1 , 1 ) with probability p a 0 R = 0.5 and ( a 0 R , b 0 R ) = ( 0.5 , 1 ) with probability 1 − p a 0 R , both in terms of the false negative and the false positive rate (see also Table 2).

Table 8:

Measures of the estimation quality for β ̂ • (multiplied by 100): bias B ( β ̂ • , β ) , sample standard deviation of the point estimates σ ( β ̂ • ) , MAE ( β ̂ • , β ) , RMSE ( β ̂ • , β ) estimated false positive ( F P ̂ ) and estimated false negative rates ( F N ̂ ) . Three estimation methods: OLS, RBE and BAY. We have two priors on R ². The first prior has a non-fixed R ² by choosing a 0 R ∈ { 0.1 , 0.5 } and b 0 R = 1 . The second has a fixed R ² by choosing ( a 0 R , b 0 R ) ∈ { ( 0.1 , 1.9 ) , ( 1,19 ) } . We have 6 DGPs where β ∈ {0, 0.025, 0.05, 0.075, 0.1, 0.2}.

Method	B ( β ̂ • , β )	σ ( β ̂ • )	MAE ( β ̂ • , β )	RMSE ( β ̂ • , β )	–
β = 0					F P ̂
OLS	4.33	6.56	5.77	7.86	8.14
RBE	0.47	6.69	5.10	6.71	7.21
BAY: R ² non-fixed	1.94	3.98	2.55	4.42	6.10
BAY: R ² fixed	1.54	3.19	2.22	3.54	8.20
β = 0.025					F N ̂
OLS	4.09	6.39	5.50	7.58	86.10
RBE	0.23	6.52	4.94	6.52	94.40
BAY: R ² non-fixed	0.79	4.55	2.84	4.61	89.04
BAY: R ² fixed	0.34	4.09	2.51	4.10	87.48
β = 0.05					F N ̂
OLS	4.49	6.68	5.90	8.05	70.17
RBE	0.61	6.81	5.20	6.84	88.76
BAY: R ² non-fixed	0.09	5.50	3.91	5.49	78.74
BAY: R ² fixed	−0.67	4.70	3.43	4.74	77.42
β = 0.075					F N ̂
OLS	4.64	6.38	5.92	7.88	48.40
RBE	0.75	6.50	4.91	6.54	80.00
BAY: R ² non-fixed	0.10	5.97	4.50	5.97	58.30
BAY: R ² fixed	−1.13	5.83	4.68	5.94	59.59
β = 0.1					F N ̂
OLS	4.46	6.67	5.89	8.02	28.30
RBE	0.60	6.80	5.20	6.83	63.27
BAY: R ² non-fixed	0.43	6.97	5.26	6.98	35.75
BAY: R ² fixed	−1.10	6.40	5.12	6.49	36.13
β = 0.2					F N ̂
OLS	4.34	6.60	5.82	7.90	0.39
RBE	0.47	6.73	5.18	6.74	3.27
BAY: R ² non-fixed	1.95	6.63	5.12	6.91	0.22
BAY: R ² fixed	0.55	6.34	4.87	6.36	0.35

D.3 Different Values for β and T

In Figure 9 we present the estimated marginal posterior distributions p ̂ β | D T , i by the kernel density estimator using the thinned draws β i ( m ) , m = 1, …, 2000 for i = 1, …, n _d for three DGPs, where we increase the sample size T and/or the value of β. The results indicate that the effect of the prior decreases as more observations or a larger level of predictability is observed.

$Figure 9: 5/50/95 % quantiles (dashed lines) of the estimated marginal posterior distributions p ̂ β | D T , i $\hat{p}\left(\beta \vert {\mathbb{D}}_{T,i}\right)$ by the kernel density estimator implemented in R, using default bandwidth choice using the thinned draws β i ( m ) ${\beta }_{i}^{\left(m\right)}$ , m = 1, …, 2000 for i = 1, …, n d . The top panel is the result for β = 0.1 and T = 1000. The middle panel is the result for β = 0.15 and T = 5000. The bottom panel is the result for β = 0.2 and T = 100.$

Figure 9:

5/50/95 % quantiles (dashed lines) of the estimated marginal posterior distributions p ̂ β | D T , i by the kernel density estimator implemented in R, using default bandwidth choice using the thinned draws β i ( m ) , m = 1, …, 2000 for i = 1, …, n _d. The top panel is the result for β = 0.1 and T = 1000. The middle panel is the result for β = 0.15 and T = 5000. The bottom panel is the result for β = 0.2 and T = 100.

Appendix E: Financial Data

E.1 Testing Stationarity

This section briefly investigates whether the data D T is stationary. For simulated data, we considered a solution of the linear stochastic difference Equation (1) on Z , where |ϕ| < 1 results in a stationary stochastic process x t , y t t ∈ Z ′ . [Since the noise terms ϵ _t are iid normally distributed we can establish weak and strong stationarity.] For our empirical data sets considered in Section 5 we investigate this question by visual inspection of the autocorrelation function (ACF) and the partial autocorrelation function (PACF), and running an augmented Dickey-Fuller (ADF) test (Dickey and Fuller 1979) as well as the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test (Kwiatkowski et al. 1992). Here the EVIEWS 13 package was used.

Returns y _t: For the empirical returns used in this article, all autocorrelations, as well as partial autocorrelations for lag orders ≤ 30 , are insignificant; see Figure 10. For the augmented Dickey–Full test we reject the null hypothesis of a unit root at a 1 % significance level, while for the KPSS test, the null hypothesis that the time-series is integrated of order zero is not rejected at a significance level of 10 %. These results are robust in the number of lags used for the augmented Dickey–Fuller test and in the bandwidth used to obtain an estimate of the long-run variance in the case of the KPSS test.

Figure 10:

ACF and PACF plots for log returns y _t. The upper panel is the result for Sample 1 and the lower panel is the result for Sample 2. The left panel is the ACF plot and the right panel is the PACF plot. The dotted lines show 95 % confidence bands.

Log dividend-price ratio x _t: The autocorrelations and the partial autocorrelations have the typical form expected for a first-order autoregressive process. That is, a decay of auto-correlations to zero, the first-order autocorrelation is around 0.9 in Figure 11. The partial autocorrelations become insignificant for lag orders ≥ 2 . Hence, the ACF and PACF support the assumption of a stationary first-order autoregressive process for both log dividend-price ratio time series. The results become more complicated when we look at the output from augmented Dickey-Fuller and KPSS tests. For the augmented Dickey-Fuller test we do not reject the null hypothesis of a unit root at usual significance levels. For the KPSS the results are mixed. For example, when using the bandwidth selection proposed in Andrews (1991) and the Bartlett kernel we do not reject the null hypothesis that the data is integrated of order zero at a 5 % significance level, while with the Newey and West (1994) selection rule and the Bartlett kernel the null hypothesis is rejected at a significance level of 5 %.

Figure 11:

ACF and PACF plots for log dividend-price ratio x _t. The upper panel is the result for Sample 1 and the lower panel is the result for Sample 2. The left panel is the ACF plot and the right panel is the PACF plot. The dotted lines show 95 % confidence bands.

We claim that these mixed results are at least partially caused by the relatively short time series dimension. Golez and Koudijs (2018) analyze annual US data from 1629 until 2015. The autoregressive coefficient for the entire period equals 0.78. However, it becomes more persistent in recent times. For the sample from 1945 until 2015, for instance, the estimate of the autoregressive coefficient is approximately equal to 0.9. For this larger data set, the authors rejected the null hypothesis of a unit root for the log dividend-price ratio at a 1 % confidence level using an augmented Dickey-Fuller test. By using this result and the structure of the ACF and PACF we follow the literature (see, e.g. Cochrane 2008) and consider the log dividend-price ratio process x t t ∈ Z to be stationary.

Other Variables: We further analyze the additional variables discussed in Subsection 5.1, namely: the Book-to-Market Ratio, the log Earnings-Price Ratio, the Default Yield Spread, log Dividend-Growth, and the Term Spread. In Figure 12 we depict the autocorrelations and the partial autocorrelations for the Book-to-Market Ratio, the log Earnings-Price Ratio, and the Default Yield Spread. The autocorrelations and partial autocorrelations are at least close to the typical form expected for a first-order autoregressive process. That is, a decay of auto-correlations to zero, the first-order autocorrelation is around 0.9 for BM and 0.75 for EP and DFY. The partial autocorrelations become insignificant for lag orders ≥ 2 . In Figure 13 we depict the autocorrelations and the partial autocorrelations for the Dividend-Growth and the Term Spread. Both time series are not persistent, with the first-order autocorrelation around 0.2 and 0.4, respectively. Hence, the ACF and PACF support the assumption of a stationary first-order autoregressive process for all time series. We further apply the augmented Dickey–Fuller and KPSS tests and present results in Table 9. For the augmented Dickey–Fuller test we reject the null hypothesis of a unit root at the 1 % level for all variables, except BM, where we reject the null at the 5 % level. For the KPSS test using the bandwidth selection proposed in Andrews (1991) and the Bartlett kernel (KPSS 1), we do not reject the null hypothesis that the data is integrated of order zero at a 1 % significance level, while with the Newey and West (1994) selection rule and the Bartlett kernel (KPSS 2), the null hypothesis is not rejected at a significance level of 5 %.

Figure 12:

ACF and PACF plots for log Earnings-Price Ratio (upper panel), Book-to-Market Ratio (middle panel) and Default Yield Spread (lower panel). The left panel is the ACF plot and the right panel is the PACF plot. The dotted lines show 95 % confidence bands.

Figure 13:

ACF and PACF plots for the log Dividend-Growth (upper panel) and the Term Spread (lower panel) the left panel is the ACF plot and the right panel is the PACF plot. The dotted lines show 95 % confidence bands.

Table 9:

Test Statistics for Book-to-Market Ratio, the log Earnings-Price Ratio, the Default Yield Spread, log Dividend-Growth, and the Term Spread for the augmented Dickey–Fuller (ADF) test (Dickey and Fuller 1979) and the Kwiatkowski–Phillips–Schmidt–Shin (KPSS) test (Kwiatkowski et al. 1992). For the ADF test, the critical values for 1 %, 5 % and 10 % levels are −4.058, −3.458 and −3.155 respectively. For the KPSS tests, the critical values for 1 %, 5 % and 10 % levels are 0.216, 0.146 and 0.119 respectively. KPSS 1 uses bandwidth selection proposed in Andrews (1991) and the Bartlett kernel. KPSS 2 uses bandwidth selection proposed in Newey and West (1994) and the Bartlett kernel.

Variable	ADF	KPSS 1	KPSS 2
EP	−4.222	0.100	0.124
BM	−3.469	0.104	0.145
DFY	−4.126	0.106	0.131
DG	−7.403	0.039	0.050
TMS	−5.658	0.114	0.119

E.2 Additional Results: Estimates of the Marginal Posteriors of β

Figure 14 presents the prior and estimated marginal posterior densities p ̂ β | D T , Sample 1 for the empirical Sample 1 for DP (left panel) and p ̂ β | D T , Sample 2 for Sample 2 (right panel) for DP, respectively. The posterior for Sample 1 has a larger spike at zero than the prior distribution that is in line with the estimated Bayes Factor being larger than one in Table 3. On the other hand, for Sample 2 the posterior distribution is slightly smaller than the prior at zero indicating that there is weak evidence of predictability. In addition, the posterior distributions can be compared to the results in Figure 3 for the simulated data, where visually the posterior for Sample 1 is more representative for DGP 0, while the posterior distribution for Sample 2 is closer to the median of the estimated posterior.

$Figure 14: Prior and posterior densities for β and point estimates for BAY, OLS and RBE. The solid line is the prior distribution and the dashed line with the filed area stands for the posterior distribution. The empty triangle is the prior mean that equals zero and the filled triangle is the estimated mean of the posterior distribution β ̂ BAY ${\hat{\beta }}^{\text{BAY}}$ , the empty square is the OLS estimator β ̂ OLS ${\hat{\beta }}^{\text{OLS}}$ and the empty circle is the RBE estimator β ̂ RBE ${\hat{\beta }}^{\text{RBE}}$ . The left panel describes the result for Sample 1, whereas the right panel describes the result for Sample 2.$

Figure 14:

Prior and posterior densities for β and point estimates for BAY, OLS and RBE. The solid line is the prior distribution and the dashed line with the filed area stands for the posterior distribution. The empty triangle is the prior mean that equals zero and the filled triangle is the estimated mean of the posterior distribution β ̂ BAY , the empty square is the OLS estimator β ̂ OLS and the empty circle is the RBE estimator β ̂ RBE . The left panel describes the result for Sample 1, whereas the right panel describes the result for Sample 2.

E.3 Convergence and Mixing Diagnostics for Empirical Data

In this section, we present the convergence and mixing analysis for the two samples of the financial data discussed in Section 5 and for other variables discussed in Section 5.1. We present the trace plots for β, ϕ and ψ in Figure 15 (only for two samples of the financial data discussed in Section 5 for brevity), and the Geweke (1992) convergence diagnostic G ̂ conv in Table 10. Estimates of the effective sample size M ^eff are provided in Table 10, where for the parameters β, ψ and ϕ the corresponding estimates are M ̂ eff . Based on the traceplots, G ̂ conv following from Geweke (1992), and the M ̂ eff , we conclude that our MCMC sampler exhibits good convergence and mixing properties for all parameters.

Figure 15:

Traceplots for the parameters ψ, ϕ and β. The left panel is for DP Sample 1 and the right panel is for DP Sample 2.

Table 10:

Effective sample size and Z-scores for the parameters: ψ, ϕ and β for the log Dividend-Price Ratio for Sample 1 and Sample 2 and for Book-to-Market Ratio, the log Earnings-Price Ratio, the Default Yield Spread, log Dividend-Growth, and the Term Spread. The number of thinned posterior draws M = 30,001.

Sample	ψ	ϕ	β
M ̂ eff
DP. Sample 1	30,001	13,328	11,867
DP. Sample 2	28,745	5,927	8,552
BM	30,001	13,978	14,359
DFY	26,991	16,861	28,303
DG	32,213	29,696	30,001
EP	30,001	26,112	12,788
TMS	30,001	15,073	30,001
G ̂ conv
DP. Sample 1	1.374	−0.645	0.961
DP. Sample 2	−1.141	−0.922	−0.351
BM	−0.363	1.082	−1.675
DFY	−1.102	−0.159	1.240
DG	0.523	0.509	−0.084
EP	0.974	−0.055	1.435
TMS	0.979	−0.847	−0.881

References

Amihud, Y., and C. M. Hurvich. 2004. “Predictive Regressions: A Reduced-Bias Estimation Method.” Journal of Financial and Quantitative Analysis 39: 813–41. https://doi.org/10.1017/s0022109000003227.Search in Google Scholar

Andrews, D. W. K. 1991. “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation.” Econometrica 59: 817–54. https://doi.org/10.2307/2938229.Search in Google Scholar

Baştürk, N., L. Hoogerheide, and H. K. van Dijk. 2017. “Bayesian Analysis of Boundary and Near-Boundary Evidence in Econometric Models with Reduced Rank.” Bayesian Analysis 12: 879–917. https://doi.org/10.1214/17-ba1061.Search in Google Scholar

Barberis, N. 2000. “Investing for the Long Run when Returns are Predictable.” The Journal of Finance 55: 225–64. https://doi.org/10.1111/0022-1082.00205.Search in Google Scholar

Berger, J. O., and R. Y. Yang. 1994. “Noninformative Priors and Bayesian Testing for the AR(1) Model.” Econometric Theory 10: 461–82. https://doi.org/10.1017/s026646660000863x.Search in Google Scholar

Cadonna, A., S. Frühwirth-Schnatter, and P. Knaus. 2020. “Triple the Gamma–A Unifying Shrinkage Prior for Variance and Variable Selection in Sparse State Space and TVP Models.” Econometrics 8: 20. https://doi.org/10.3390/econometrics8020020.Search in Google Scholar

Campbell, J. Y. 1987. “Stock Returns and the Term Structure.” Journal of Financial Economics 18: 373–99. https://doi.org/10.1016/0304-405x(87)90045-6.Search in Google Scholar

Campbell, J. Y. 2017. Financial Decisions and Markets: A Course in Asset Pricing. Princeton, New Jersey: Princeton University Press.Search in Google Scholar

Campbell, J. Y., and R. J. Shiller. 1988. “The Dividend-Price Ratio and Expectations of Future Dividends and Discount Factors.” Review of Financial Studies 1: 195–228. https://doi.org/10.1093/rfs/1.3.195.Search in Google Scholar

Choi, I. 2015. Almost all about Unit Roots: Foundations, Developments, and Applications. Cambridge, England: Cambridge University Press.10.1017/CBO9781316157824Search in Google Scholar

Cochrane, J. H. 2008. “The Dog that Did Not Bark: A Defense of Return Predictability.” Review of Financial Studies 21: 1533–75. https://doi.org/10.1093/rfs/hhm046.Search in Google Scholar

Cochrane, J. H. 2011. “Presidential Address: Discount Rates.” The Journal of Finance 66: 1047–108. https://doi.org/10.1111/j.1540-6261.2011.01671.x.Search in Google Scholar

de Pooter, M., F. Ravazzolo, R. Segers, and H. K. van Dijk. 2008. “Bayesian Near-Boundary Analysis in Basic Macroeconomic Time-Series Models.” Bayesian Econometrics 23: 331–402. https://doi.org/10.1016/s0731-9053(08)23011-2.Search in Google Scholar

Dickey, D. A., and W. A. Fuller. 1979. “Distribution of the Estimators for Autoregressive Time Series with a Unit Root.” Journal of the American Statistical Association 74: 427–31. https://doi.org/10.2307/2286348.Search in Google Scholar

Dickey, J. M., and B. P. Lientz. 1970. “The Weighted Likelihood Ratio, Sharp Hypotheses About Chances, the Order of a Markov Chain.” The Annals of Mathematical Statistics 41: 214–26. https://doi.org/10.1214/aoms/1177697203.Search in Google Scholar

Fama, E. F. 1970. “Efficient Capital Markets: A Review of Theory and Empirical Work.” The Journal of Finance 25: 383. https://doi.org/10.2307/2325486.Search in Google Scholar

Fama, E. F., and K. R. French. 1988. “Dividend Yields and Expected Stock Returns.” Journal of Financial Economics 22: 3–25. https://doi.org/10.1016/0304-405x(88)90020-7.Search in Google Scholar

Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin. 1995. Bayesian Data Analysis. Boca Raton, Florida: Chapman and Hall/CRC.10.1201/9780429258411Search in Google Scholar

Geweke, J. 1992. “Evaluating the Accuracy of Sampling-Based Approaches to the Calculations of Posterior Moments.” Bayesian Statistics 4: 641–9.10.1093/oso/9780198522669.003.0010Search in Google Scholar

Giannone, D., M. Lenza, and G. E. Primiceri. 2021. “Economic Predictions with Big Data: The Illusion of Sparsity.” Econometrica 89: 2409–37. https://doi.org/10.3982/ecta17842.Search in Google Scholar

Golez, B., and P. Koudijs. 2018. “Four Centuries of Return Predictability.” Journal of Financial Economics 127: 248–63. https://doi.org/10.1016/j.jfineco.2017.12.007.Search in Google Scholar

Griffin, J., and P. Brown. 2017. “Hierarchical Shrinkage Priors for Regression Models.” Bayesian Analysis 12: 135–59. https://doi.org/10.1214/15-ba990.Search in Google Scholar

Gupta, A. K., and S. Nadarajah. 2004. Handbook of Beta Distribution and Its Applications. Boca Raton, Florida: CRC Press.10.1201/9781482276596Search in Google Scholar

Hoogerheide, L., J. F. Kaashoek, and H. K. van Dijk. 2007. “On the Shape of Posterior Densities and Credible Sets in Instrumental Variable Regression Models with Reduced Rank: An Application of Flexible Sampling Methods Using Neural Networks.” Journal of Econometrics 139: 154–80. https://doi.org/10.1016/j.jeconom.2006.06.009.Search in Google Scholar

Johnson, N. L., S. Kotz, and N. Balakrishnan. 1995. Continuous Univariate Distributions, Volume 2, Vol. 289. Princeton, New Jersey: John Wiley & Sons, Ltd.Search in Google Scholar

Kastner, G., and S. Frühwirth-Schnatter. 2014. “Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Estimation of Stochastic Volatility Models.” Computational Statistics & Data Analysis 76: 408–23. https://doi.org/10.1016/j.csda.2013.01.002.Search in Google Scholar

Kendall, M. G. 1954. “Note on Bias in the Estimation of Autocorrelation.” Biometrika 41: 403–4. https://doi.org/10.2307/2332720.Search in Google Scholar

Kloek, T., and H. K. van Dijk. 1978. “Bayesian Estimates of Equation System Parameters: An Application of Integration by Monte Carlo.” Econometrica 46: 1–19. https://doi.org/10.2307/1913641.Search in Google Scholar

Krone, T., C. J. Albers, and M. E. Timmerman. 2017. “A Comparative Simulation Study of AR(1) Estimators in Short Time Series.” Quality and Quantity 51: 1–21. https://doi.org/10.1007/s11135-015-0290-1.Search in Google Scholar PubMed PubMed Central

Kwiatkowski, D., P. C. B. Phillips, P. Schmidt, and Y. Shin. 1992. “Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root.” Journal of Econometrics 54: 159–78. https://doi.org/10.1016/0304-4076(92)90104-y.Search in Google Scholar

Lopes, H. F., and N. G. Polson. 2014. “Bayesian Instrumental Variables: Priors and Likelihoods.” Econometric Reviews 33: 100–21. https://doi.org/10.1080/07474938.2013.807146.Search in Google Scholar

Nagel, S. 2021. Machine Learning in Asset Pricing. Princeton, New Jersey: Princeton University Press.10.23943/princeton/9780691218700.001.0001Search in Google Scholar

Newey, W. K., and K. D. West. 1994. “Automatic Lag Selection in Covariance Matrix Estimation.” The Review of Economic Studies 61: 631–53. https://doi.org/10.2307/2297912.Search in Google Scholar

Pettenuzzo, D., A. Timmermann, and R. Valkanov. 2014. “Forecasting Stock Returns under Economic Constraints.” Journal of Financial Economics 114: 517–53. https://doi.org/10.1016/j.jfineco.2014.07.015.Search in Google Scholar

Phillips, P. C. 2015. “Halbert White Jr. Memorial JFEC Lecture: Pitfalls and Possibilities in Predictive Regression.” Journal of Financial Econometrics 13: 521–55. https://doi.org/10.1093/jjfinec/nbv014.Search in Google Scholar

Poirier, D. 1996. “Prior Beliefs About Fit.” Bayesian Statistics 5: 731–8.10.1093/oso/9780198523567.003.0053Search in Google Scholar

Poirier, D. J. 1978. “The Effect of the First Observation in Regression Models with First-Order Autoregressive Disturbances.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 27: 67–8. https://doi.org/10.2307/2346228.Search in Google Scholar

Rossi, P. E., G. M. Allenby, and R. McCulloch. 2006. Bayesian Statistics and Marketing. Princeton, New Jersey: John Wiley & Sons, Ltd.10.1002/0470863692Search in Google Scholar

Ruud, P. A. 2000. An Introduction to Classical Econometric Theory. Oxford, England: Oxford University Press.Search in Google Scholar

Schotman, P., and H. K. van Dijk. 1991a. “A Bayesian Analysis of the Unit Root in Real Exchange Rates.” Journal of Econometrics 49: 195–238. https://doi.org/10.1016/0304-4076(91)90014-5.Search in Google Scholar

Schotman, P. C., and H. K. van Dijk. 1991b. “On Bayesian Routes to Unit Roots.” Journal of Applied Econometrics 6: 387–401. https://doi.org/10.1002/jae.3950060407.Search in Google Scholar

Stambaugh, R. F. 1999. “Predictive Regressions.” Journal of Financial Economics 54: 375–421. https://doi.org/10.1016/s0304-405x(99)00041-0.Search in Google Scholar

Wachter, J. A., and M. Warusawitharana. 2009. “Predictable Returns and Asset allocation: Should a Skeptical Investor Time the Market?” Journal of Econometrics 148: 162–78. https://doi.org/10.1016/j.jeconom.2008.10.009.Search in Google Scholar

Wachter, J. A., and M. Warusawitharana. 2015. “What is the Chance that the Equity Premium Varies Over Time? Evidence from Regressions on the Dividend-Price Ratio.” Journal of Econometrics 186: 74–93. https://doi.org/10.1016/j.jeconom.2014.05.018.Search in Google Scholar

Wagenmakers, E.-J., T. Lodewyckx, H. Kuriyal, and R. Grasman. 2010. “Bayesian Hypothesis Testing for Psychologists: A Tutorial on the Savage–Dickey Method.” Cognitive Psychology 60: 158–89. https://doi.org/10.1016/j.cogpsych.2009.12.001.Search in Google Scholar PubMed

Welch, I., and A. Goyal. 2008. “A Comprehensive Look at the Empirical Performance of Equity Premium Prediction.” Review of Financial Studies 21: 1455–508. https://doi.org/10.1093/rfs/hhm014.Search in Google Scholar

Zellner, A., T. Ando, N. Başturk, L. Hoogerheide, and H. K. van Dijk. 2014. “Bayesian Analysis of Instrumental Variable Models: Acceptance-Rejection within Direct Monte Carlo.” Econometric Reviews 33: 3–35. https://doi.org/10.1080/07474938.2013.807094.Search in Google Scholar

Zhang, Y. D., B. P. Naughton, H. D. Bondell, and B. J. Reich. 2020. “Bayesian Regression Using a Prior on the Model Fit: The R2-D2 Shrinkage Prior.” Journal of the American Statistical Association 117: 862–74. https://doi.org/10.1080/01621459.2020.1825449.Search in Google Scholar

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/snde-2022-0110).

Received: 2022-11-29

Accepted: 2023-10-16

Published Online: 2023-12-25

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/snde-2022-0110

Keywords for this article

VAR; return predictability; Bayesian control function approach; shrinkage priors; Bayes Factor

Bayesian Reconciliation of Return Predictability

Article

Abstract

Acknowledgment

Appendix A: Priors

A.1 Explicit Prior on R 2

A.1.1 Application to the Predictive System (1)

Appendix B: MCMC Details

B.1 Details for Sampling (α x , ϕ)

B.2 Sampling Details for Σ β

B.3 Sampling of a 0 R

B.4 Convergence and Mixing Diagnostics for Simulated Data

Appendix C: Bayesian Testing

C.1 An Example on Bayes Factors

C.2 Computation of Bayes Factors via the Savage–Dickey Density Ratio

C.3 Simulation Results on Bayes Factors

Appendix D: Robustness Checks

D.1 Different Values for β and b 0 R

D.2 Holding the Prior Expectation of R 2 Fixed

D.3 Different Values for β and T

Appendix E: Financial Data

E.1 Testing Stationarity

E.2 Additional Results: Estimates of the Marginal Posteriors of β

E.3 Convergence and Mixing Diagnostics for Empirical Data

References

Supplementary Material

Supplementary Material

Articles in the same Issue

Articles in the same Issue

Articles in the same Issue

A.1 Explicit Prior on R ²

B.1 Details for Sampling (α ^x, ϕ)

B.2 Sampling Details for Σ_β

D.2 Holding the Prior Expectation of R ² Fixed