Abstract
In logistic regression models, different patterns of data points in observed data can cause large bias in parameter estimates, especially when separation is present in the observed data. In the frequentist approach, maximum likelihood estimates fail to exist when separation occurs in the observed data. In the Bayesian approach, the existence of posterior means is also affected by the presence of separation depending on the form of prior distributions. In this paper, a non-informative G-prior for Bayesian method is proposed to reduce the bias of the parameter estimation when prior distributions of parameters do not have information and separation is present in the data. In this proposed method, the information from observed data and ideas of a normal regression model are implemented to form the mean and standard deviation of the normal prior distributions. The Markov chain Monte Carlo algorithm is then employed by using Metropolis Hasting algorithm to sample for the target posterior distribution. Results show that estimates from the proposed Bayesian method are more accurate and reliable than from the classical approach when separation is present or is not present in the observed data. Moreover, the proposed Bayesian method can provide better estimated results compared to the default Cauchy prior Bayesian approach when the prior distribution does not have information. The proposed method is also validated by applying it to a case study of MROZ data.
Funding source: Viet Nam National University Ho Chi Minh City
Award Identifier / Grant number: DS.C2025-16-02
Funding statement: This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) under grant number DS.C2025-16-02.
References
[1] A. Albert and J. A. Anderson, On the existence of maximum likelihood estimates in logistic regression models, Biometrika 71 (1984), no. 1, 1–10. Suche in Google Scholar
[2] S. Brooks, A. Gelman, G. L. Jones and X. Meng, Handbook of Markov Chain Monte Carlo, Chapman & Hall, New York, 2011. Suche in Google Scholar
[3] H. M. Choi and J. P. Hobert, The Polya-gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic, Electron. J. Stat. 7 (2013), 2054–2064. Suche in Google Scholar
[4] A. Gelman, J. B. Carlin, H. S. Stern and D. B. Rubin, Bayesian Data Analysis, Texts Statist. Sci. Ser., Chapman & Hall, London, 1995. Suche in Google Scholar
[5] A. Gelman and J. Hill, Data Analysis Using Regression and Multilevel/ Hierarchical Models. Vol. 1, Cambridge University, New York, 2007. Suche in Google Scholar
[6] A. Gelman, A. Jakulin, M. G. Pittau and Y.-S. Su, A weakly informative default prior distribution for logistic and other regression models, Ann. Appl. Stat. 2 (2008), no. 4, 1360–1383. Suche in Google Scholar
[7] J. Geweke, Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments, Bayesian Statistics 4 (Peñíscola 1991), Oxford University, New York (1992), 169–193. Suche in Google Scholar
[8] J. Ghosh, Y. Li and R. Mitra, On the use of Cauchy prior distributions for Bayesian logistic regression, Bayesian Anal. 13 (2018), no. 2, 359–383. Suche in Google Scholar
[9] P. Golpour, M. Ghayour-Mobarhan, A. Saki, H. Esmaily, A. Taghipour, M. Tajfard, H. Ghazizadeh, M. Moohebati and G. A. Ferns, Comparison of support vector machine, naive bayes and logistic regression for assessing the necessity for coronary angiography, Int. J. Environ. Res. Public Health 17 (2020), no. 18, Article ID 6449. Suche in Google Scholar
[10] T. E. Hanson, A. J. Branscum and W. O. Johnson, Informative g-priors for logistic regression, Bayesian Anal. 9 (2014), no. 3, 597–611. Suche in Google Scholar
[11] F. Harrell, Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression and Survival Analysis, Springer, New York, 2015. Suche in Google Scholar
[12] P. Huong and P. Hoa, On the existence of posterior mean for Bayesian logistic regression, Monte Carlo Methods Appl. 7 (2021), no. 3, 277–288. Suche in Google Scholar
[13] J. Jiang, Linear and Generalized Linear Mixed Models and Their Applications, Springer Ser. Statist., Springer, New York, 2010. Suche in Google Scholar
[14] M. A. Mansournia, A. Geroldinger, S. Greenland and G. Heinze, Separation in logistic regression: Causes, consequences, and control, Amer. J. Epidemiology 187 (2018), no. 4, 864–870. Suche in Google Scholar
[15] T. A. Mroz, The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions, Econometrica 55 (1987), no. 4, 765–799. Suche in Google Scholar
[16] H. T. T. Pham and H. Pham, Controlling separation in generating samples for logistic regression models, Math. Methods Statist. 33 (2024), no. 1, 1–11. Suche in Google Scholar
[17] H. T. T. Pham, H. Pham and D. Nur, A Bayesian inference for the penalized spline joint models of longitudinal and time-to-event data: A prior sensitivity analysis, Monte Carlo Methods Appl. 26 (2020), no. 1, 49–68. Suche in Google Scholar
[18] N. G. Polson, J. G. Scott and J. Windle, Bayesian inference for logistic models using Pólya–Gamma latent variables, J. Amer. Statist. Assoc. 108 (2013), no. 504, 1339–1349. Suche in Google Scholar
[19] C. Rainey, Dealing with separation in logistic regression models, Political Anal. 24 (2016), no. 3, 339–355. Suche in Google Scholar
[20] C. P. Robert and G. Casella, Introducing Monte Carlo Methods with R, Use R!, Springer, New York, 2009. Suche in Google Scholar
[21] P. L. Speckman, J. Lee and D. Sun, Existence of the MLE and propriety of posteriors for a general multinomial choice model, Statist. Sinica 19 (2009), no. 2, 731–748. Suche in Google Scholar
[22] J. Wakefield, Bayesian and Frequentist Regression Methods, Springer Ser. Statist., Springer, New York, 2013. Suche in Google Scholar
© 2025 Walter de Gruyter GmbH, Berlin/Boston