Home Bayesian information criterion approximations to Bayes factors for univariate and multivariate logistic regression models
Article
Licensed
Unlicensed Requires Authentication

Bayesian information criterion approximations to Bayes factors for univariate and multivariate logistic regression models

  • Katharina Selig ORCID logo , Pamela Shaw and Donna Ankerst EMAIL logo
Published/Copyright: October 29, 2020

Abstract

Schwarz’s criterion, also known as the Bayesian Information Criterion or BIC, is commonly used for model selection in logistic regression due to its simple intuitive formula. For tests of nested hypotheses in independent and identically distributed data as well as in Normal linear regression, previous results have motivated use of Schwarz’s criterion by its consistent approximation to the Bayes factor (BF), defined as the ratio of posterior to prior model odds. Furthermore, under construction of an intuitive unit-information prior for the parameters of interest to test for inclusion in the nested models, previous results have shown that Schwarz’s criterion approximates the BF to higher order in the neighborhood of the simpler nested model. This paper extends these results to univariate and multivariate logistic regression, providing approximations to the BF for arbitrary prior distributions and definitions of the unit-information prior corresponding to Schwarz’s approximation. Simulations show accuracies of the approximations for small samples sizes as well as comparisons to conclusions from frequentist testing. We present an application in prostate cancer, the motivating setting for our work, which illustrates the approximation for large data sets in a practical example.


Corresponding author: Donna Ankerst, Department of Mathematics, Technical University of Munich, Munchen, Germany, E-mail:

  1. Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: None declared.

  3. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

Appendix

7.1 Proof of Proposition 1

Under Assumption 1 we can apply the Laplace approximation (3) to the numerator and denominator of (4) [1], [21]. Recall that m 0 = dim(Θ) and m = dim(Θ × Ψ). With

det ( ( 1 n D 2 l 0 ( θ ˆ 0 ) ) 1 ) 1 2 = det ( n ( D 2 l 0 ( θ ˆ 0 ) ) 1 ) 1 2 = n m 0 2 det ( ( D 2 l 0 ( θ ˆ 0 ) ) 1 ) 1 2

and

det ( ( 1 n D 2 l ( θ ˆ , ψ ˆ ) ) 1 ) 1 2 = n m 2 det ( ( D 2 l ( θ ˆ , ψ ˆ ) ) 1 ) 1 2

we obtain the approximation

BF = ( 2 π n ) m 0 2 det ( ( 1 n D 2 l 0 ( θ ˆ 0 ) ) 1 ) 1 2 exp ( l 0 ( θ ˆ 0 ) ) f 0 ( θ ˆ 0 ) ( 1 + O p ( n 1 ) ) ( 2 π n ) m 2 det ( ( 1 n D 2 l ( θ ˆ , ψ ˆ ) ) 1 ) 1 2 exp ( l ( θ ˆ , ψ ˆ ) ) f ( θ ˆ , ψ ˆ ) ( 1 + O p ( n 1 ) ) = ( 2 π ) m 0 2 det ( ( D 2 l 0 ( θ ˆ 0 ) ) 1 ) 1 2 exp ( l 0 ( θ ˆ 0 ) ) f 0 ( θ ˆ 0 ) ( 1 + O p ( n 1 ) ) ( 2 π ) m 2 det ( ( D 2 l ( θ ˆ , ψ ˆ ) ) 1 ) 1 2 exp ( l ( θ ˆ , ψ ˆ ) ) f ( θ ˆ , ψ ˆ ) ( 1 + O p ( n 1 ) ) .

Applying 1 + O p ( n 1 ) 1 + O p ( n 1 ) = 1 + O p ( n 1 ) we obtain

BF = ( 2 π ) m 0 m 2 det ( ( D 2 l 0 ( θ ˆ 0 ) ) 1 ) 1 2 exp ( l 0 ( θ ˆ 0 ) ) f 0 ( θ ˆ 0 ) det ( ( D 2 l ( θ ˆ , ψ ˆ ) ) 1 ) 1 2 exp ( l ( θ ˆ , ψ ˆ ) ) f ( θ ˆ , ψ ˆ ) ( 1 + O p ( n 1 ) ) .

Thus a first approximation to the BF is

BF ˆ = ( 2 π ) m 0 m 2 det ( Σ ˆ 0 ) 1 2 exp ( l 0 ( θ ˆ 0 ) ) f 0 ( θ ˆ 0 ) det ( Σ ˆ ) 1 2 exp ( l ( θ ˆ , ψ ˆ ) ) f ( θ ˆ , ψ ˆ ) ,

where Σ ˆ 0 = ( D 2 l 0 ( θ ˆ 0 ) ) 1 and Σ ˆ = ( D 2 l ( θ ˆ , ψ ˆ ) ) 1 [1]. □

7.2 Proof of Proposition 2

We follow and extend the argumentation by [66] and [67]. With (7) we can rewrite the log-likelihood as follows

(12) l ( ξ , ψ ) = l * ( Φ ( ξ , ψ ) , ψ ) = l * ( θ , ψ ) .

For the first and second order partial derivatives with respect to ξ and ψ, we consider their single entries. Then,

l ( ξ , ψ ) ψ j 1 × 1 = l * ( Φ ( ξ , ψ ) , ψ ) ψ j 1 × 1 + Φ ( ξ , ψ ) ψ j 1 × m 0 l * ( Φ ( ξ , ψ ) , ψ ) Φ m 0 × 1 ,

using partial derivatives of functions and the chain rule. Thus, first derivative with respect to ψ is a − m 0 dimensional vector.

The mixed second order partial derivatives are

(13) 2 l ( ξ , ψ ) ξ k ψ j 1 × 1 = Φ ( ξ , ψ ) ξ k 1 × m 0 2 l * ( Φ ( ξ , ψ ) , ψ ) Φ ψ j m 0 × 1 + ξ k [ Φ ( ξ , ψ ) ψ j 1 × m 0 l * ( Φ ( ξ , ψ ) , ψ ) Φ m 0 × 1 ] = Φ ( ξ , ψ ) ξ k 2 l * ( Φ ( ξ , ψ ) , ψ ) Φ ψ j + [ ξ k l * ( Φ ( ξ , ψ ) , ψ ) Φ ] ( Φ ( ξ , ψ ) ψ j ) + [ ξ k ( Φ ( ξ , ψ ) ψ j ) ] l * ( Φ ( ξ , ψ ) , ψ ) Φ = Φ ( ξ , ψ ) ξ k 2 l * ( Φ ( ξ , ψ ) , ψ ) Φ ψ j + Φ ( ξ , ψ ) ξ k 1 × m 0 2 l * ( Φ ( ξ , ψ ) , ψ ) Φ Φ’ m 0 × m 0 ( Φ ( ξ , ψ ) ψ j ) m 0 × 1 + 2 Φ ( ξ , ψ ) ξ k ψ j 1 × m 0 l * ( Φ ( ξ , ψ ) , ψ ) Φ m 0 × 1 ,

using partial derivatives of functions, the chain rule, and product derivative rules. 2 l ( ξ , ψ ) ξ ψ is a m 0 × (− m 0) – matrix with second order partial derivatives.

For the expectation of the last part of the sum in (13) it holds that

E 2 Φ ξ , ψ ξ k ψ j l * Φ ξ , ψ , ψ Φ = 2 Φ ξ , ψ ξ k ψ j E l * Φ ξ , ψ , ψ Φ  

as Φ (ξ, ψ) is constant with respect to X. Further,

E [ l * ( Φ ( ξ , ψ ) , ψ ) Φ ] = log ( P ( x | Φ ( ξ , ψ ) , ψ ) ) Φ P ( x | Φ ( ξ , ψ ) , ψ ) d x = 1 P ( x | Φ ( ξ , ψ ) , ψ ) P ( x | Φ ( ξ , ψ ) , ψ ) P ( x | Φ ( ξ , ψ ) , ψ ) Φ d x = Φ P ( x | Φ ( ξ , ψ ) , ψ ) d x = Φ 1 = 0 ,

where we could exchange the order of differentiation and integration since for each ( Φ ( ξ , ψ ) , ψ ) P ( x | Φ ( ξ , ψ ) , ψ ) is integrable of x. Thus, for I ξ ψ ( ξ , ψ ) = E [ 2 l ( ξ , ψ ) ξ ψ ] , similar to (6), we have with (13)

(14) I ξ ψ ( ξ , ψ ) = Φ ( ξ , ψ ) ξ I Φ ψ * ( Φ ( ξ , ψ ) , ψ ) + Φ ( ξ , ψ ) ξ I ΦΦ * ( Φ ( ξ , ψ ) , ψ ) ( Φ ( ξ , ψ ) ψ ) ,

where we denote by I* the expected Fisher information matrix for l *.

We are interested in whether I Φ ψ * ( Φ ( ξ , ψ 0 ) , ψ 0 ) = 0 for all ξ and thus we evaluate (14) at ψ = ψ 0. First note by taking the partial derivative with respect to ξ of (7) we obtain

Φ ( ξ , ψ ) ξ = Id m 0 + ( ξ I ξ ξ ( ξ , ψ 0 ) 1 ) I ξ ψ ( ξ , ψ 0 ) ( ψ ψ 0 ) + I ξ ξ ( ξ , ψ 0 ) 1 ( ξ I ξ ψ ( ξ , ψ 0 ) ) ( ψ ψ 0 ) ,

where Id m0 is the m 0 × m 0 identity matrix. This implies

(15) Φ ξ , ψ ξ | ψ = ψ 0 = Id m 0 .

Similarly, by taking the partial derivative with respect to ψ of (7)

Φ ( ξ , ψ ) ψ = ( I ξ ξ ( ξ , ψ 0 ) 1 I ξ ψ ( ξ , ψ 0 ) )

so that (14) evaluated at ψ = ψ 0 yields

(16) I ξ ψ ( ξ , ψ 0 ) = Id m 0 I Φ ψ * ( Φ ( ξ , ψ 0 ) , ψ 0 ) + Id m 0 I ΦΦ * ( Φ ( ξ , ψ 0 ) , ψ 0 ) I ξ ξ ( ξ , ψ 0 ) 1 I ξ ψ ( ξ , ψ 0 ) .

Next we note that

2 Φ ( ξ , ψ ) ξ k ξ j = ξ k [ ( ξ j I ξ ξ ( ξ , ψ 0 ) 1 ) I ξ ψ ( ξ , ψ 0 ) + I ξ ξ ( ξ , ψ 0 ) 1 ( ξ j I ξ ψ ( ξ , ψ 0 ) ) ] ( ψ ψ 0 )

and this implies

(17) 2 Φ ξ , ψ ξ k ξ j | ψ = ψ 0 = 0 .

With (12) it follows that

2 l ( ξ , ψ ) ξ k ξ j = 2 l * ( Φ ( ξ , ψ ) , ψ ) ξ k ξ j = ξ k ( Φ ( ξ , ψ ) ξ j l * ( Φ ( ξ , ψ ) , ψ ) Φ ) = Φ ( ξ , ψ ) ξ k 2 l * ( Φ ( ξ , ψ ) , ψ ) Φ Φ ( Φ ( ξ , ψ ) ξ j ) + 2 Φ ( ξ , ψ ) ξ k ξ j l * ( Φ ( ξ , ψ ) , ψ ) Φ ,

using partial derivatives of functions, the chain rule and product derivatives rules. With (15) and (17) we obtain

2 l ξ , ψ ξ k ξ j | ψ = ψ 0 = e k 2 l * Φ ξ , ψ 0 , ψ 0 Φ Φ e j = 2 l * Φ ξ , ψ 0 , ψ 0 Φ k Φ j ,

where e k , e j denote the unit vectors with one at entry k, j, respectively, and zero otherwise. It follows

I ξ ξ ( ξ , ψ 0 ) = E [ 2 l ( ξ , ψ 0 ) ξ ξ ] = E [ 2 l * ( Φ ( ξ , ψ 0 ) , ψ 0 ) Φ Φ ] = I ΦΦ * ( Φ ( ξ , ψ 0 ) , ψ 0 ) .

Solving (16) for I Φ ψ * ( Φ ( ξ , ψ 0 ) , ψ 0 ) yields

I Φ ψ * ( Φ ( ξ , ψ 0 ) , ψ 0 ) = I ξ ψ ( ξ , ψ 0 ) I ΦΦ * ( Φ ( ξ , ψ 0 ) , ψ 0 ) I ξ ξ ( ξ , ψ 0 ) 1 I ξ ψ ( ξ , ψ 0 ) = I ξ ψ ( ξ , ψ 0 ) I ξ ξ ( ξ , ψ 0 ) I ξ ξ ( ξ , ψ 0 ) 1 I ξ ψ ( ξ , ψ 0 ) = I ξ ψ ( ξ , ψ 0 ) I ξ ψ ( ξ , ψ 0 ) = 0.

Thus, θ and ψ are null orthogonal. □

7.3 Proof of Proposition 3

Substituting (5) into Proposition 1 yields

(18) BF = BF ˆ ( 1 + O p ( n 1 ) ) = ( 2 π ) m 0 m 2 det ( ( D 2 l 0 ( θ ˆ 0 ) ) 1 ) 1 2 exp ( l 0 ( θ ˆ 0 ) ) f 0 ( θ ˆ 0 ) det ( ( D 2 l ( θ ˆ , ψ ˆ ) ) 1 ) 1 2 exp ( l ( θ ˆ , ψ ˆ ) ) f ( θ ˆ , ψ ˆ ) ( 1 + O p ( n 1 ) ) = ( 2 π n ) m 0 m 2 det ( 1 n D 2 l 0 ( θ ˆ 0 ) ) 1 2 det ( 1 n D 2 l ( θ ˆ , ψ ˆ ) ) 1 2 exp ( l 0 ( θ ˆ 0 ) ) exp ( l ( θ ˆ , ψ ˆ ) ) f 0 ( θ ˆ 0 ) f ( θ ˆ , ψ ˆ ) ( 1 + O p ( n 1 ) ) .

For the remaining approximation we separately consider the components of (18). First, we show that det ( 1 n D 2 l 0 ( θ ˆ 0 ) ) 1 2 det ( 1 n D 2 l ( θ ˆ , ψ ˆ ) ) 1 2 = det ( I ψ ψ ( θ ˆ , ψ 0 ) ) 1 2 ( 1 + O p ( n 1 2 ) ) and then f 0 ( θ ˆ 0 ) f ( θ ˆ , ψ ˆ ) = 1 + O p ( n 1 ) f ψ | θ ( ψ ˆ | θ ˆ ) .

For the first part, we show that θ ˆ 0 θ ˆ = O p ( n 1 2 ) . Since θ ˆ 0 is the MLE of 0 (θ) we have

l 0 ( θ ˆ 0 ) θ = l ( θ ˆ 0 , ψ 0 ) θ = ( l ( θ ˆ 0 , ψ 0 ) θ 1 l ( θ ˆ 0 , ψ 0 ) θ m 0 ) = 0.

We expand each component of l 0 ( θ ˆ 0 ) θ around ( θ ˆ , ψ ˆ ) using a Taylor approximation and obtain for k = 1, …, m 0 [68],

(19) 0 = l ( θ ˆ 0 , ψ 0 ) θ k = l ( θ ˆ , ψ ˆ ) θ k + j = 1 m 0 ( θ ˆ 0 j θ ˆ j ) 2 l ( θ ˆ , ψ ˆ ) θ k θ j + j = 1 m m 0 ( ψ ˆ 0 j ψ j ) 2 l ( θ ˆ , ψ ˆ ) θ k ψ j + o p ( | | ( θ ˆ 0 ψ 0 ) ( θ ˆ ψ ˆ ) | | ) .

With Definition S1 (Supplementary Material) and Assumption 2.4 we note that

o p ( | | ( θ ˆ 0 ψ 0 ) ( θ ˆ ψ ˆ ) | | ) = o p ( ( θ ˆ 0 ψ 0 ) ( θ ˆ ψ ˆ ) ) = O p ( ( θ ˆ 0 0 ) ( θ ˆ 0 ) + ( 0 ψ 0 ) ( 0 ψ ˆ ) ) = O p ( θ ˆ 0 θ ˆ ) + O p ( n 1 2 ) .

We divide (19) by n and with l ( θ ˆ , ψ ˆ ) θ k = 0 , since ( θ ˆ , ψ ˆ ) is the MLE of (θ, ψ), we obtain

(20) 0 = 1 n j = 1 m 0 ( θ ˆ 0 j θ ˆ j ) 2 l ( θ ˆ , ψ ˆ ) θ k θ j + 1 n j = 1 m m 0 ( ψ ˆ 0 j ψ j ) 2 l ( θ ˆ , ψ ˆ ) θ k ψ j + 1 n O p ( θ ˆ 0 θ ˆ ) + 1 n O p ( n 1 2 ) = 1 n j = 1 m 0 ( θ ˆ 0 j θ ˆ j ) 2 l ( θ ˆ , ψ ˆ ) θ k θ j + 1 n j = 1 m m 0 O p ( n 1 2 ) 2 l ( θ ˆ , ψ ˆ ) θ k ψ j + 1 n O p ( θ ˆ 0 θ ˆ ) + O p ( n 3 2 ) ,

where we use Assumption 2.4 in the last step. Recall that D 2 l (θ, ψ) = −XW(θ,ψ)X, where W (θ, ψ) is a diagonal matrix with elements

0 W i i ( θ , ψ ) = G ( X i ( θ ψ ) ) ( 1 G ( X i ( θ ψ ) ) ) 1

with G ( z ) = exp ( z ) 1 + exp ( z ) . Therefore, for any entry of d kj  = (D 2 l (θ, ψ)) kj we obtain

| d k j | = | i = 1 n X i k W i i ( θ , ψ ) X i j | n | max i = 1 , , n ( X i k W i i X i j ) | n | max i = 1 , , n ( X i k X i j ) |

and it follows d kj  = O p (n). We apply this result to the entries 2 l ( θ ˆ , ψ ˆ ) θ k θ j and 2 l ( θ ˆ , ψ ˆ ) θ k ψ j in (20) and use the multiplicative and additive properties of O p

0 = 1 n j = 1 m 0 ( θ ˆ 0 j θ ˆ j ) O p ( n ) + 1 n j = 1 m m 0 O p ( n 1 2 ) O p ( n ) + 1 n O p ( θ ˆ 0 θ ˆ ) + O p ( n 3 2 ) 0 = j = 1 m 0 ( θ ˆ 0 j θ ˆ j ) O p ( 1 ) + ( m m 0 ) O p ( n 1 2 ) + 1 n O p ( θ ˆ 0 θ ˆ ) + O p ( n 3 2 ) = j = 1 m 0 ( θ ˆ 0 j θ ˆ j ) O p ( 1 ) + 1 n O p ( θ ˆ 0 θ ˆ ) + O p ( n 1 2 ) .

Thus, we obtain

j = 1 m 0 ( θ ˆ 0 j θ ˆ j ) O p ( 1 ) + 1 n O p ( θ ˆ 0 θ ˆ ) = O p ( n 1 2 )

and it follows that

(21) θ ˆ θ ˆ 0 = O p ( n 1 2 ) .

Next, we show that D 2 l 0 ( θ ˆ 0 ) = D θ θ 2 l ( θ ˆ , ψ 0 ) ( 1 + O p ( n 1 2 ) ) , where D θ θ 2 denotes the part of the Hessian matrix of l corresponding to θ. With (21), Assumption 2.2, and the properties of O p we obtain

exp ( X i ( θ ˆ 0 ψ 0 ) ) = exp ( j = 1 m 0 X i j θ ˆ 0 j + j = 1 m m 0 X i m 0 + j ψ 0 j ) = exp ( j = 1 m 0 X i j ( θ ˆ j + O p ( n 1 2 ) ) + j = 1 m m 0 X i m 0 + j ψ 0 j ) = exp ( j = 1 m 0 X i j θ ˆ j + j = 1 m m 0 X i m 0 + j ψ 0 j + j = 1 m 0 X i j O p ( n 1 2 ) ) = exp ( X i ( θ ˆ ψ 0 ) + m 0 O p ( 1 ) O p ( n 1 2 ) ) = exp ( X i ( θ ˆ ψ 0 ) + O p ( n 1 2 ) ) = exp ( X i ( θ ˆ ψ 0 ) ) exp ( O p ( n 1 2 ) ) .

With exp ( O p ( n 1 2 ) ) = 1 + O p ( n 1 2 ) as shown in Proposition S2 this yields

exp ( X i ( θ ˆ 0 ψ 0 ) ) = exp ( X i ( θ ˆ ψ 0 ) ) ( 1 + O p ( n 1 2 ) ) .

We note that with the additive properties of O p it follows that

1 + exp ( X i ( θ ˆ 0 ψ 0 ) ) = 1 + exp ( X i ( θ ˆ ψ 0 ) ) ( 1 + O p ( n 1 2 ) ) = 1 + exp ( X i ( θ ˆ ψ 0 ) ) + exp ( X i ( θ ˆ ψ 0 ) ) O p ( n 1 2 ) = 1 + exp ( X i ( θ ˆ ψ 0 ) ) + exp ( X i ( θ ˆ ψ 0 ) ) O p ( n 1 2 ) + O p ( n 1 2 ) = [ 1 + exp ( X i ( θ ˆ ψ 0 ) ) ] ( 1 + O p ( n 1 2 ) ) .

Thus, with Proposition S1 we obtain

(22) G ( X i ( θ ˆ 0 ψ 0 ) ) = exp ( X i ( θ ˆ 0 ψ 0 ) ) 1 + exp ( X i ( θ ˆ 0 ψ 0 ) ) = exp ( X i ( θ ˆ ψ 0 ) ) ( 1 + O p ( n 1 2 ) ) [ 1 + exp ( X i ( θ ˆ ψ 0 ) ) ] ( 1 + O p ( n 1 2 ) ) = G ( X i ( θ ˆ ψ 0 ) ) ( 1 + O p ( n 1 2 ) ) .

Using

( 1 + O p ( n 1 2 ) ) 2 = 1 + 2 O p ( n 1 2 ) + O p ( n 1 ) = 1 + O p ( n 1 2 )

and (22), we approximate the diagonal elements of W ( θ ˆ 0 , ψ 0 ) by

W i i ( θ ˆ 0 , ψ 0 ) = G ( X i ( θ ˆ 0 ψ 0 ) ) ( 1 G ( X i ( θ ˆ 0 ψ 0 ) ) ) = G ( X i ( θ ˆ ψ 0 ) ) ( 1 + O p ( n 1 2 ) ) ( 1 G ( X i ( θ ˆ ψ 0 ) ) ( 1 + O p ( n 1 2 ) ) ) = G ( X i ( θ ˆ ψ 0 ) ) ( ( 1 + O p ( n 1 2 ) ) G ( X i ( θ ˆ ψ 0 ) ) ( 1 + O p ( n 1 2 ) ) 2 ) = G ( X i ( θ ˆ ψ 0 ) ) ( ( 1 + O p ( n 1 2 ) ) G ( X i ( θ ˆ ψ 0 ) ) ( 1 + O p ( n 1 2 ) ) ) = G ( X i ( θ ˆ ψ 0 ) ) ( 1 G ( X i ( θ ˆ ψ 0 ) ) ) ( 1 + O p ( n 1 2 ) ) = W i i ( θ ˆ , ψ 0 ) ( 1 + O p ( n 1 2 ) ) .

We obtain for any entry d k j ( 0 ) = ( D 2 l 0 ( θ ˆ 0 ) ) k j

d k j ( 0 ) = i = 1 n X i k W i i ( θ ˆ 0 , ψ 0 ) X i j = i = 1 n X i k W i i ( θ ˆ , ψ 0 ) ( 1 + O p ( n 1 2 ) ) X i j = [ i = 1 n X i k W i i ( θ ˆ , ψ 0 ) X i j ] ( 1 + O p ( n 1 2 ) ) = d k j ( 1 + O p ( n 1 2 ) ) = ( D θ θ 2 l ( θ ˆ , ψ 0 ) ) k j ( 1 + O p ( n 1 2 ) ) .

We note that the indices k and j only index the part of D θ θ 2 l ( θ ˆ , ψ 0 ) corresponding to θ. We obtain with Assumption 2.3

1 n D 2 l 0 ( θ ˆ 0 ) = 1 n D θ θ 2 l ( θ ˆ , ψ 0 ) ( Id m 0 + O p ( n 1 2 ) ) = ( I θ θ ( θ ˆ , ψ 0 ) + O p ( n 1 2 ) ) ( Id m 0 + O p ( n 1 2 ) ) .

I (θ, ψ) is the expected Fisher information matrix for a single observation X and thus

(23) I ( θ , ψ ) = E [ X G ( X β ) ( 1 G ( X β ) ) X ] E [ X X ] = O p ( 1 ) ,

where we assume finite second moments for the distribution of data X. We obtain with I (θ, ψ) = O p (1)

(24) I ( θ , ψ ) + O p ( n 1 2 ) = I ( θ , ψ ) ( Id m + O p ( n 1 2 ) )

and it follows

(25) 1 n D 2 l 0 ( θ ˆ 0 ) = I θ θ ( θ ˆ , ψ 0 ) ( Id m 0 + O p ( n 1 2 ) ) ( Id m 0 + O p ( n 1 2 ) ) = I θ θ ( θ ˆ , ψ 0 ) ( Id m 0 + 2 O p ( n 1 2 ) + O p ( n 1 ) ) = I θ θ ( θ ˆ , ψ 0 ) ( Id m 0 + O p ( n 1 2 ) )

Analogously to above, we can show

exp ( X i ( θ ˆ ψ ˆ ) ) = exp ( X i ( θ ˆ ψ 0 ) ) ( 1 + O p ( n 1 2 ) )

and

G ( X i ( θ ˆ ψ ˆ ) ) = G ( X i ( θ ˆ ψ 0 ) ) ( 1 + O p ( n 1 2 ) )

which yields

W i i ( θ ˆ , ψ ˆ ) = W i i ( θ ˆ , ψ 0 ) ( 1 + O p ( n 1 2 ) )

and thus

D 2 l ( θ ˆ , ψ ˆ ) = D 2 l ( θ ˆ , ψ 0 ) ( Id m + O p ( n 1 2 ) ) .

With Assumption 2.3 and (24) we obtain

(26) 1 n D 2 l ( θ ˆ , ψ ˆ ) = ( I ( θ ˆ , ψ 0 ) + O p ( n 1 2 ) ) ( Id m + O p ( n 1 2 ) ) = I ( θ ˆ , ψ 0 ) ( Id m + O p ( n 1 2 ) ) ( Id m + O p ( n 1 2 ) ) = I ( θ ˆ , ψ 0 ) ( Id m + O p ( n 1 2 ) ) .

Under Assumption 2.1 and with the determinant product rule it follows

det ( I ( θ ˆ , ψ 0 ) ) = det ( I θ θ ( θ ˆ , ψ 0 ) ) det ( I ψ ψ ( θ ˆ , ψ 0 ) ) .

With the determinant product rule, (25), and (26) this yields

(27) det ( 1 n D 2 l 0 ( θ ˆ 0 ) ) det ( 1 n D 2 l ( θ ˆ , ψ ˆ ) ) = det ( I θ θ ( θ ˆ , ψ 0 ) ( Id m 0 + O p ( n 1 2 ) ) ) det ( I ( θ ˆ , ψ 0 ) ( Id m + O p ( n 1 2 ) ) ) = det ( I θ θ ( θ ˆ , ψ 0 ) ) det ( Id m 0 + O p ( n 1 2 ) ) det ( I θ θ ( θ ˆ , ψ 0 ) ) det ( I ψ ψ ( θ ˆ , ψ 0 ) ) det ( Id m + O p ( n 1 2 ) ) = det ( Id m 0 + O p ( n 1 2 ) ) det ( I ψ ψ ( θ ˆ , ψ 0 ) ) det ( Id m + O p ( n 1 2 ) ) .

We obtain with Hadamard’s inequality and the multiplicative and additive properties of O p [69]

(28) det ( Id m 0 + O p ( n 1 2 ) ) ( 1 + O p ( n 1 2 ) ) m 0 = 1 + O p ( n 1 2 ) , det ( Id m + O p ( n 1 2 ) ) ( 1 + O p ( n 1 2 ) ) m = 1 + O p ( n 1 2 ) .

D 2 l ( θ ˆ , ψ ˆ ) is positive semidefinite since is convex. Thus I θ θ ( θ ˆ , ψ 0 ) = E [ D θ θ 2 l ( θ ˆ , ψ 0 ) ] and I ( θ ˆ , ψ 0 ) = E [ D 2 l ( θ ˆ , ψ 0 ) ] are positive semidefinite and it follows Id m + O p ( n 1 2 ) and Id m 0 + O p ( n 1 2 ) are positive semidefinite. Let μ i  ≥ 0 denote the eigenvalues of the random matrix Y n  = O p (n −½).

Then

det ( Id m + Y n ) = i = 1 m ( 1 + μ i ) 1 + i = 1 m μ i = 1 + det ( Y n ) 1 + O p ( n 1 2 )

and analogously det ( Id m 0 + Y n ) 1 + O p ( n 1 2 ) . It follows with (28)

det ( Id m + O p ( n 1 2 ) ) = 1 + O p ( n 1 2 ) .

Substituting this into (27) yields with Proposition S1

det ( 1 n D 2 l 0 ( θ ˆ 0 ) ) det ( 1 n D 2 l ( θ ˆ , ψ ˆ ) ) = 1 + O p ( n 1 2 ) det ( I ψ ψ ( θ ˆ , ψ 0 ) ) ( 1 + O p ( n 1 2 ) ) = det ( I ψ ψ ( θ ˆ , ψ 0 ) ) 1 ( 1 + O p ( n 1 2 ) ) .

and using ( 1 + O p ( n 1 2 ) ) 1 2 = 1 + O p ( n 1 2 ) , shown in Proposition S2, we obtain

(29) det ( 1 n D 2 l 0 ( θ ˆ 0 ) ) 1 2 det ( 1 n D 2 l ( θ ˆ , ψ ˆ ) ) 1 2 = ( det ( 1 n D 2 l 0 ( θ ˆ 0 ) ) det ( 1 n D 2 l ( θ ˆ , ψ ˆ ) ) ) 1 2 = det ( I ψ ψ ( θ ˆ , ψ 0 ) ) 1 2 ( 1 + O p ( n 1 2 ) ) 1 2 = det ( I ψ ψ ( θ ˆ , ψ 0 ) ) 1 2 ( 1 + O p ( n 1 2 ) ) .

It is left to show that f 0 ( θ ˆ 0 ) f ( θ ˆ , ψ ˆ ) is approximated by 1 f ψ | θ ( ψ ˆ | θ ˆ ) . Using a Taylor expansion of f 0 around θ ˆ we show with (21) and under the assumption that f 0 and its derivative is bounded in an area around θ ˆ that

f 0 ( θ ˆ 0 ) = f 0 ( θ ˆ ) + j = 1 m 0 ( θ ˆ j θ ˆ 0 j ) θ j f 0 ( θ ˆ ) + o ( θ ˆ θ ˆ 0 ) = f 0 ( θ ˆ ) + j = 1 m 0 O p ( n 1 2 ) θ j f 0 ( θ ˆ ) + o ( O p ( n 1 2 ) ) = f 0 ( θ ˆ ) + O p ( n 1 2 ) = f 0 ( θ ˆ ) ( 1 + O p ( n 1 ) ) .

Thus, with Assumption 2.2 it follows

(30) f 0 ( θ ˆ 0 ) f ( θ ˆ , ψ ˆ ) = f 0 ( θ ˆ 0 ) f ψ | θ ( ψ ˆ | θ ˆ ) f 0 ( θ ˆ ) = f 0 ( θ ˆ ) ( 1 + O p ( n 1 ) ) f ψ | θ ( ψ ˆ | θ ˆ ) f 0 ( θ ˆ ) = 1 + O p ( n 1 ) f ψ | θ ( ψ ˆ | θ ˆ ) .

Substituting (29) and (30) into (18) yields, with the additive and multiplicative properties of O p , the following approximation for the BF

BF = ( 2 π n ) m 0 m 2 det ( I ψ ψ ( θ ˆ , ψ 0 ) ) 1 2 ( 1 + O p ( n 1 2 ) ) exp ( l 0 ( θ ˆ 0 ) ) exp ( l ( θ ˆ , ψ ˆ ) ) 1 + O p ( n 1 ) f ψ | θ ( ψ ˆ | θ ˆ ) ( 1 + O p ( n 1 ) ) = ( 2 π n ) m 0 m 2 det ( I ψ ψ ( θ ˆ , ψ 0 ) ) 1 2 exp ( l 0 ( θ ˆ 0 ) ) exp ( l ( θ ˆ , ψ ˆ ) ) 1 f ψ | θ ( ψ ˆ | θ ˆ ) ( 1 + O p ( n 1 2 ) ) .

7.4 Proof of Corollary 1

We take the logarithm on both sides of the result from Proposition 3 and obtain

log ( BF ) = m 0 m 2 log ( 2 π n ) + 1 2 log ( det ( I ψ ψ ( θ ˆ , ψ 0 ) ) ) + l 0 ( θ ˆ 0 ) l ( θ ˆ , ψ ˆ ) log ( f ψ | θ ( ψ ˆ | θ ˆ ) ) + log ( 1 + O p ( n 1 2 ) ) = l 0 ( θ ˆ 0 ) l ( θ ˆ , ψ ˆ ) + m m 0 2 log ( n ) + m 0 m 2 log ( 2 π ) + 1 2 log ( det ( I ψ ψ ( θ ˆ , ψ 0 ) ) ) log ( f ψ | θ ( ψ ˆ | θ ˆ ) ) + O p ( n 1 2 ) ,

where we use log ( 1 + O p ( n 1 2 ) ) = O p ( n 1 2 ) shown in Proposition S2. Thus we have

(31) log ( BF ) = S + m 0 m 2 log ( 2 π ) + 1 2 log ( det ( I ψ ψ ( θ ˆ , ψ 0 ) ) ) log ( f ψ | θ ( ψ ˆ | θ ˆ ) ) + O p ( n 1 2 ) .

Using (23) we obtain

log ( det ( I ψ ψ ( θ ˆ , ψ 0 ) ) ) = log ( det ( O p ( 1 ) ) ) = O p ( 1 ) .

With m 0 m 2 log ( 2 π ) = O p ( 1 ) and log ( f ψ | θ ( ψ ˆ | θ ˆ ) ) = O p ( 1 ) this yields

log ( BF ) = S + O p ( 1 ) .

7.5 Proof of Theorem 1

The multivariate normal density of ψ|θ at ψ ˆ | θ ˆ is given by

f ψ | θ ( ψ ˆ | θ ˆ ) = ( 2 π ) m 0 m 2 det ( I ψ ψ ( θ ˆ , ψ 0 ) ) 1 2 exp ( 1 2 ( ψ ˆ ψ 0 ) I ψ ψ ( θ ˆ , ψ 0 ) ( ψ ˆ ψ 0 ) ) = ( 2 π ) m 0 m 2 det ( I ψ ψ ( θ ˆ , ψ 0 ) ) 1 2 exp ( O p ( n 1 2 ) O p ( 1 ) O p ( n 1 2 ) ) = ( 2 π ) m 0 m 2 det ( I ψ ψ ( θ ˆ , ψ 0 ) ) 1 2 + O p ( n 1 ) ,

where we use (23) and exp ( O p ( n 1 ) ) = 1 + O p ( n 1 ) shown in Proposition S2. Next, note that

log ( f ψ | θ ( ψ ˆ | θ ˆ ) ) = log ( ( 2 π ) m m 0 2 det ( I ψ ψ ( θ ˆ , ψ 0 ) ) 1 2 ( 1 + O p ( n 1 ) ) ) = m m 0 2 log ( 2 π ) + 1 2 log ( det ( I ψ ψ ( θ ˆ , ψ 0 ) ) ) + log ( 1 + O p ( n 1 ) ) = m m 0 2 log ( 2 π ) + 1 2 log ( det ( I ψ ψ ( θ ˆ , ψ 0 ) ) ) + O p ( n 1 ) ,

since log ( 1 + O p ( n 1 ) ) = O p ( n 1 ) as shown in Proposition S2. Substitution into (31) yields

log ( BF ) = S + O p ( n 1 2 ) O p ( n 1 ) = S + O p ( n 1 2 ) .

7.6 Proof of Proposition 3 for the multivariate logistic regression model

As stated in the main text, we have to proof the remaining statements as given in the following lemma.

Lemma 1

Under the assumptions of Proposition 3 applied to the multivariate logistic regression model, we have D 2 l ( θ ˆ , ψ ˆ ) = O p ( n ) , D 2 l 0 ( θ ˆ 0 ) = O p ( n ) , I(θ,ψ) = O p (1), 1 n D 2 l 0 ( θ ˆ 0 ) = 1 n D θ θ 2 l ( θ ˆ , ψ 0 ) + O p ( n 1 2 ) , and 1 n D 2 l ( θ ˆ , ψ ˆ ) = 1 n D 2 l ( θ ˆ , ψ 0 ) + O p ( n 1 2 ) .

Proof

The first two expressions follow from the assumptions of Laplace regularity for and 0. This implies that any partial derivative up to the 6th derivative of 1 n l and 1 n l 0 in an open ball around the MLE is bounded and thus these second derivatives are O p (1), which shows that D 2 l ( θ ˆ , ψ ˆ ) = O p ( n ) and D 2 l 0 ( θ ˆ 0 ) = O p ( n ) . Then, we can show θ ˆ θ ˆ 0 = O p ( n 1 2 ) analogous to the univariate case. The Fisher information matrix I(θ,ψ) is the expected value of the negative Hessian matrix for a single observation. Thus, I (θ, ψ) = O p (1).

Using a Taylor expansion for d k j ( 0 ) = ( D 2 l 0 ( θ ˆ 0 ) ) k j around θ ˆ we obtain with θ ˆ θ ˆ 0 = O p ( n 1 2 ) and the assumptions of Laplace regularity for the partial derivatives of

d k j ( 0 ) = 2 l ( θ ˆ 0 , ψ 0 ) θ k θ j = 2 l ( θ ˆ , ψ 0 ) θ k θ j + i = 1 m 0 ( θ ˆ 0 i θ ˆ i ) 3 l ( θ ˆ , ψ 0 ) θ k θ j θ i + o p ( θ ˆ 0 θ ˆ ) = 2 l ( θ ˆ , ψ 0 ) θ k θ j + m 0 O p ( n 1 2 ) O p ( n ) + O p ( n 1 2 ) = 2 l ( θ ˆ , ψ 0 ) θ k θ j + O p ( n 1 2 ) .

Thus, 1 n D 2 l 0 ( θ ˆ 0 ) = 1 n D θ θ 2 l ( θ ˆ , ψ 0 ) + O p ( n 1 2 ) . Arguing similarly with a Taylor expansion for d k j = ( D 2 l ( θ ˆ , ψ ˆ ) ) k j around ψ 0 and using ψ ˆ ψ 0 = O p ( n 1 2 ) , we have 1 n D 2 l ( θ ˆ , ψ ˆ ) = 1 n D 2 l ( θ ˆ , ψ 0 ) + O p ( n 1 2 ) . □

References

1. Kass, RE, Vaidyanathan, SK. Approximate Bayes factors and orthogonal parameters, with application to testing equality of two binomial proportions. J R Stat Soc B 1992;54:129–44. https://doi.org/10.1111/j.2517-6161.1992.tb01868.x.Search in Google Scholar

2. Pauler, DK. The Schwarz criterion and related methods for normal linear models. Biometrika 1998;85:13–27, https://doi.org/10.1093/biomet/85.1.13.Search in Google Scholar

3. Pauler, DK, Wakefield, JC, Kass, RE. Bayes factors and approximations for variance component models. J Am Stat Assoc 1999;94:1242–53, https://doi.org/10.1080/01621459.1999.10473877.Search in Google Scholar

4. Raftery, AE. Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 1996;83:251–66, https://doi.org/10.1093/biomet/83.2.251.Search in Google Scholar

5. Volinsky, CT, Raftery, AE. Bayesian information criterion for censored survival models. Biometrics 2000;56:256–62, https://doi.org/10.1111/j.0006-341x.2000.00256.x.Search in Google Scholar PubMed

6. Venables, WN, Ripley, BD. Modern applied statistics with S, 4th ed. New York, NY: Springer; 2010.Search in Google Scholar

7. Kass, RE, Raftery, AE. Bayes factors. J Am Stat Assoc 1995;90:773–95, https://doi.org/10.1080/01621459.1995.10476572.Search in Google Scholar

8. Kass, RE, Wasserman, L. A reference Bayesian test for nested hypotheses and its relationship to the schwarz criterion. J Am Stat Assoc 1995;90:928–34, https://doi.org/10.1080/01621459.1995.10476592.Search in Google Scholar

9. Raftery, AE. Bayesian model selection in social research. Socio Methodol 1995;25:111–63, https://doi.org/10.2307/271063.Search in Google Scholar

10. Cavanaugh, J, Neath, A. Generalizing the derivation of the schwarz information criterion. Commun Stat Theor Methods 1999;28:49–66, https://doi.org/10.1080/03610929908832282.Search in Google Scholar

11. Amin, A. Pitfalls of diagnosis of extraprostatic extension in prostate adenocarcinoma. Ann Clin Pathol 2016;4:1086.Search in Google Scholar

12. Fischer, S, Lin, D, Simon, RM, Howard, LE, Aronson, WJ, Terris, MK, et al. Do all men with pathological gleason score 8-10 prostate cancer have poor outcomes? results from the search database. BJU Int 2016;118:250–7, https://doi.org/10.1111/bju.13319.Search in Google Scholar PubMed

13. Datta, K, Muders, M, Zhang, H, Tindall, DJ. Mechanism of lymph node metastasis in prostate cancer. Future Oncol 2010;6:823–36, https://doi.org/10.2217/fon.10.33.Search in Google Scholar PubMed PubMed Central

14. Mydlo, JH, Godec, CJ, editors. Prostate cancer: science and clinical practice, 2nd ed. London: Elsevier; 2016.Search in Google Scholar

15. Epstein, JI, Feng, Z, Trock, BJ, Pierorazio, PM. Upgrading and downgrading of prostate cancer from biopsy to radical prostatectomy: incidence and predictive factors using the modified gleason grading system and factoring in tertiary grades. Eur Urol 2012;61:1019–24, https://doi.org/10.1016/j.eururo.2012.01.050.Search in Google Scholar PubMed PubMed Central

16. Selig, K. Bayesian information criterion approximations for model selection in multivariate logistic regression with application to electronic medical records, Dissertation. München: Technische Universität München; 2020.10.1515/ijb-2020-0045Search in Google Scholar PubMed

17. D’Amico, AV, Chen, M-H, Roehl, KA, Catalona, WJ. Preoperative PSA velocity and the risk of death from prostate cancer after radical prostatectomy. N Engl J Med 2004;351:125–35.10.1056/NEJMoa032975Search in Google Scholar PubMed

18. O’Brien, MF, Cronin, AM, Fearn, PA, Smith, B, Stasi, J, Guillonneau, B, et al. Pretreatment prostate-specific antigen (PSA) velocity and doubling time are associated with outcome but neither improves prediction of outcome beyond pretreatment PSA alone in patients treated with radical prostatectomy. J Clin Oncol 2009;27:3591–7, https://doi.org/10.1200/jco.2008.19.9794.Search in Google Scholar

19. Collett, D. Modelling binary data, 2nd ed. Boca Raton, FL: Chapman and Hall/CRC; 2003. Available from: http://www.loc.gov/catdir/enhancements/fy0646/2002073648-d.html.Search in Google Scholar

20. McCullagh, P, Nelder, JA. Generalized linear models, monographs on statistics and applied probability, 2nd ed. London: Chapman & Hall; 1999.Search in Google Scholar

21. Kass, RE, Tierney, L, Kadane, JB. The validity of posterior expansions based on laplace’s method. In: Geisser, S, Hodges, JS, Press, SJ, Zellner, A., editors. Essays in honor of George Bernard. Amsterdam: North-Holland; 1990. pp. 473–88.Search in Google Scholar

22. Zehna, PW. Invariance of maximum likelihood estimators. Ann Math Stat 1966;37:744, https://doi.org/10.1214/aoms/1177699475.Search in Google Scholar

23. Wasserman, L. All of statistics: a concise course in statistical inference, 2nd ed.New York, NY: Springer; 2005.10.1007/978-0-387-21736-9Search in Google Scholar

24. Schwarz, G. Estimating the dimension of a model. Ann Stat 1978;6:461–4, https://doi.org/10.1214/aos/1176344136.Search in Google Scholar

25. Kass, RE, Wasserman, L. The selection of prior distributions by formal rules. J Am Stat Assoc 1996;91:1343–70, https://doi.org/10.1080/01621459.1996.10477003.Search in Google Scholar

26. Raftery, AE. Bayes factors and BIC. Socio Methods Res 1999;27:411–27. https://doi.org/10.1177/0049124199027003005.Search in Google Scholar

27. Jeffreys, H. Theory of probability, 3rd ed. Oxford: Clarendon Press; 1998.10.1093/oso/9780198503682.001.0001Search in Google Scholar

28. Neath, AA, Cavanaugh, JE. The Bayesian information criterion: background, derivation, and applications. WIREs Comput Stat 2012;4:199–203. https://doi.org/10.1002/wics.199.Search in Google Scholar

29. R Core Team. R: a language and environment for statistical computing; 2019. Available from: https://www.R-project.org/.Search in Google Scholar

30. Albert, A, Anderson, JA. On the existence of maximum likelihood estimates in logistic regression models. Biometrika 1984;71:1–10, https://doi.org/10.1093/biomet/71.1.1.Search in Google Scholar

31. Santner, TJ, Duffy, DE. A note on A. Albert and J. A. Anderson’s conditions for the existence of maximum likelihood estimates in logistic regression models. Biometrika 1986;73:755–8, https://doi.org/10.1093/biomet/73.3.755.Search in Google Scholar

32. O’Brien, SM, Dunson, DB. Bayesian multivariate logistic regression. Biometrics 2004;60:739–46.10.1111/j.0006-341X.2004.00224.xSearch in Google Scholar PubMed

33. Albert, JH, Chib, S. Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 1993;88:669–79, https://doi.org/10.1080/01621459.1993.10476321.Search in Google Scholar

34. Nishimoto, K, Nakashima, J, Hashiguchi, A, Kikuchi, E, Miyajima, A, Nakagawa, K, et al. Prediction of extraprostatic extension by prostate specific antigen velocity, endorectal mri, and biopsy gleason score in clinically localized prostate cancer. Int J Urol 2008;15:520–3, https://doi.org/10.1111/j.1442-2042.2008.02042.x.Search in Google Scholar PubMed

35. Chen, M-H, Ibrahim, JG, Yiannoutsos, C. Prior elicitation, variable selection and Bayesian computation for logistic regression models. J Roy Stat Soc B 1999;61:223–42, https://doi.org/10.1111/1467-9868.00173.Search in Google Scholar

36. Elfadaly, FG, Garthwaite, PH. On quantifying expert opinion about multinomial models that contain covariates. J R Stat Soc 2020;20:845.10.1111/rssa.12546Search in Google Scholar

37. Strobl, AN, Vickers, AJ, van Calster, B, Steyerberg, E, Leach, RJ, Thompson, IM, et al. Improving patient prostate cancer risk assessment: moving from static, globally-applied to dynamic, practice-specific risk calculators. J Biomed Inf 2015;56:87–93, https://doi.org/10.1016/j.jbi.2015.05.001.Search in Google Scholar PubMed PubMed Central

38. Barber, RF, Drton, M. High-dimensional using model selection with Bayesian information criteria. Electron J Stat 2015;9:567–607, https://doi.org/10.1214/15-ejs1012.Search in Google Scholar

39. Chen, J, Chen, Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika 2008;95:759–71, https://doi.org/10.1093/biomet/asn034.Search in Google Scholar

40. Chen, J, Chen, Z. Extended BIC for small-n-large-p sparse GLM. Stat Sin 2012;22. https://doi.org/10.5705/ss.2010.216.Search in Google Scholar

41. Drton, M, Plummer, M. A Bayesian information criterion for singular models. J R Stat Soc B 2017;79:323–80, https://doi.org/10.1111/rssb.12187.Search in Google Scholar

42. Foygel, R, Drton, M. Extended Bayesian information criteria for Gaussian graphical models. In: Lafferty, JD, Williams, CKI, Shawe-Taylor, J, Zemel, RS, Culotta, A, editors. Advances in neural information processing systems. Curran Associates, Inc.; 2010, vol. 23. pp. 604–12.Search in Google Scholar

43. Jones, RH. Bayesian information criterion for longitudinal and clustered data. Stat Med 2011;30:3050–6, https://doi.org/10.1002/sim.4323.Search in Google Scholar PubMed

44. Kawano, S. Selection of tuning parameters in bridge regression models via Bayesian information criterion. Stat Pap 2014;55:1207–23, https://doi.org/10.1007/s00362-013-0561-7.Search in Google Scholar

45. Konishi, S, Ando, T, Imoto, S. Bayesian information criteria and smoothing parameter selection in radial basis function networks. Biometrika 2004;91:27–43, https://doi.org/10.1093/biomet/91.1.27.Search in Google Scholar

46. Lee, ER, Noh, H, Park, BU. Model selection via Bayesian information criterion for quantile regression models. J Am Stat Assoc 2014;109:216–29, https://doi.org/10.1080/01621459.2013.836975.Search in Google Scholar

47. Luo, S, Xu, J, Chen, Z. Extended Bayesian information criterion in the cox model with a high-dimensional feature space. Ann Inst Stat Math 2015;67:287–311, https://doi.org/10.1007/s10463-014-0448-y.Search in Google Scholar

48. Mehrjou, A, Hosseini, R, Nadjar Araabi, B. Improved Bayesian information criterion for mixture model selection. Pattern Recogn Lett 2016;69:22–7, https://doi.org/10.1016/j.patrec.2015.10.004.Search in Google Scholar

49. Watanabe, S. A widely applicable bayesian information criterion. J Mach Learn Res 2013;14:867–97.Search in Google Scholar

50. Żak-Szatkowska, M, Bogdan, M. Modified versions of the Bayesian information criterion for sparse generalized linear models. Comput Stat Data Anal 2011;55:2908–24.10.1016/j.csda.2011.04.016Search in Google Scholar

51. Ashford, JR, Sowden, RR. Multi-variate probit analysis. Biometrics 1970;26:535, https://doi.org/10.2307/2529107.Search in Google Scholar

52. Bahadur, RR. A representation of the joint distribution of responses to n dichotomous items. In: Solomon, H, editor. Studies in item analysis and prediction. Stanford, California: Stanford University Press; 1961. pp. 158–68.Search in Google Scholar

53. Bel, K, Fok, D, Paap, R. Parameter estimation in multivariate logit models with many binary choices. Econ Rev 2016;37:534–50, https://doi.org/10.1080/07474938.2015.1093780.Search in Google Scholar

54. Bergsma, WP. Marginal models for categorical data, Dissertation. Tilburg: Tilburg University; 1997.Search in Google Scholar

55. Bergsma, WP, Rudas, T. Marginal models for categorical data. Ann Stat 2002;30:140–59.10.1214/aos/1015362188Search in Google Scholar

56. Bonney, GE. Logistic regression for dependent binary observations. Biometrics 1987;43:951–73, https://doi.org/10.2307/2531548.Search in Google Scholar

57. Chib, S, Greenberg, E. Analysis of multivariate probit models. Biometrika 1998;85:347–61, https://doi.org/10.1093/biomet/85.2.347.Search in Google Scholar

58. Cox, DR. The analysis of multivariate binary data. J R Stat Soc: Ser C (Appl Stat) 1972;21:113–20, https://doi.org/10.2307/2346482.Search in Google Scholar

59. Dai, B. Multivariate Bernoulli distribution models. Dissertation. Madison, Wisconsin: University of Wisconsin; 2012.Search in Google Scholar

60. Dai, B, Ding, S, Wahba, G. Multivariate Bernoulli distribution. Bernoulli 2013;19:1465–83, https://doi.org/10.3150/12-bejsp10.Search in Google Scholar

61. Ekholm, A, Smith, PWF, McDonald, JW. Marginal regression analysis of a multivariate binary response. Biometrika 1995;82:847–54, https://doi.org/10.1093/biomet/82.4.847.Search in Google Scholar

62. Fitzmaurice, GM, Laird, NM, Rotnitzky, AG. Regression models for discrete longitudinal responses. Stat Sci 1993;8:284–99, https://doi.org/10.1214/ss/1177010899.Search in Google Scholar

63. Glonek, G, McCullagh, P. Multivariate logistic models. J R Stat Soc B 1995;57:533–46, https://doi.org/10.1111/j.2517-6161.1995.tb02046.x.Search in Google Scholar

64. Joe, H, Liu, Y. A model for a multivariate binary response with covariates based on compatible conditionally specified logistic regressions. Stat Prob Lett 1996;31:113–20, https://doi.org/10.1016/s0167-7152(96)00021-1.Search in Google Scholar

65. Russell, GJ, Petersen, A. Analysis of cross category dependence in market basket selection. J Retail 2000;76:367–92, https://doi.org/10.1016/s0022-4359(00)00030-0.Search in Google Scholar

66. Cox, DR, Reid, N. Parameter orthogonality and approximate conditional inference. J R Stat Soc B 1987;49:1–39, https://doi.org/10.1111/j.2517-6161.1987.tb01422.x.Search in Google Scholar

67. Huzurbazar, VS, Jeffreys, H. Probability distributions and orthogonal parameters. Math Proc Camb Philos Soc 46;1950:281–4, https://doi.org/10.1017/s0305004100025743.Search in Google Scholar

68. Königsberger, K. Analysis 2, 4th ed. Berlin and Heidelberg: Springer; 2002.10.1007/978-3-662-05699-8Search in Google Scholar

69. Horn, RA, Johnson, CR. Matrix analysis, 2nd ed. New York, NY: Cambridge University Press; 2012.10.1017/CBO9781139020411Search in Google Scholar


Supplementary Material

The online version of this article offers supplementary material (https://doi.org/10.1515/ijb-2020-0045).


Received: 2020-04-06
Accepted: 2020-10-08
Published Online: 2020-10-29

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 6.10.2025 from https://www.degruyterbrill.com/document/doi/10.1515/ijb-2020-0045/html
Scroll to top button