Abstract
We propose a logistic regression model for selecting the best subset of explanatory variables under budget constraints, where the optimal subset may not always be affordable. Unlike traditional logistic regression models, which typically address model fitting and variable selection in separate steps, our proposed model integrates both steps into a single process. Computational studies on multiple datasets show that our methods are competitive and robust compared to traditional model selection techniques. Furthermore, they offer the practical advantage of selecting models subject to budget constraints on variables.
References
[1] A. Agresti, An Introduction to Categorical Data Analysis, Wiley Ser. Probab. Stat., John Wiley & Sons, New York, 1996. Suche in Google Scholar
[2] D. Bertsimas and A. King, OR forum—an algorithmic approach to linear regression, Oper. Res. 64 (2016), no. 1, 2–16. 10.1287/opre.2015.1436Suche in Google Scholar
[3] D. Bertsimas, A. King and R. Mazumder, Best subset selection via a modern optimization lens, Ann. Statist. 44 (2016), no. 2, 813–852. 10.1214/15-AOS1388Suche in Google Scholar
[4] Z. Bursac, C. H. Gauss, D. K. Williams and D. W. Hosmer, Purposeful selection of variables in logistic regression, Source Code Biol. Med. 3 (2008), Paper No. 17. 10.1186/1751-0473-3-17Suche in Google Scholar PubMed PubMed Central
[5] M. A. Efroymson, Multiple regression analysis, Mathematical Methods for Digital Computers, Wiley, New York (1960), 191–203. Suche in Google Scholar
[6] J. F. Frank, M. F. Sergio, T. R. Svetlozar and G. A. Bala, The Basics of Financial Econometrics: Tools, Concepts, and Asset Management Applications, John Wiley and Sons, New York, 2014. Suche in Google Scholar
[7] N. Friedman, D. Geiger and M. Goldszmidt, Bayesian network classifiers, Mach. Learn. 29 (1997), 131–163. 10.1023/A:1007465528199Suche in Google Scholar
[8] J. Friedman, T. Hastie and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, J. Statist. Softw. 33 (2010), 1–22. 10.18637/jss.v033.i01Suche in Google Scholar
[9] N. Friedman and D. Koller, Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks, Mach. Learn. 50 (2003), 95–125. 10.1023/A:1020249912095Suche in Google Scholar
[10] L. A. Goodman, A Modified multiple regression approach to the analysis of dichotomous variables, Amer. Sociol. Rev. 37 (1972), 28–46. 10.2307/2093491Suche in Google Scholar
[11] I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Mach. Learn. 46 (2002), 389–422. 10.1023/A:1012487302797Suche in Google Scholar
[12] D. W. Hosmer, Jr., S. Lemeshow and R. X. Sturdivant, Applied logistic Regression, 3rd ed., Wiley, Hoboken 2013. 10.1002/9781118548387Suche in Google Scholar
[13] G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning, Springer Texts Statist. 103, Springer, New York, 2013. 10.1007/978-1-4614-7138-7Suche in Google Scholar
[14] S. Keerthi and E. Gilbert, Convergence of a generalized SMO algorithm for SVM classifier design, Mach. Learn. 46 (2002), 351–360. 10.1023/A:1012431217818Suche in Google Scholar
[15] J. S. Long, Regression Models for Categorical and Limited Dependent Variables, Sage, Thousand Oaks, 1997. Suche in Google Scholar
[16] A. Lucadamo and B. Simonetti, Variable selection in logistic regression, Adv. Appl. Stat. Sci. 6 (2011), 42–53. Suche in Google Scholar
[17] S. Menard, Applied Logistic Regression Analysis, Sage, Thousand Oaks, 1995. Suche in Google Scholar
[18] S. Menard, Six approaches to calculating standardized logistic regression coefficients, Amer. Statist. 58 (2004), no. 3, 218–223. 10.1198/000313004X946Suche in Google Scholar
[19] Y. W. Park and D. Klabjan, Subset selection for multiple linear regression via optimization, J. Global Optim. 77 (2020), 543–574. 10.1007/s10898-020-00876-1Suche in Google Scholar
[20] W. Rejchel, Lasso with convex loss: Model selection consistency and estimation, Comm. Statist. Theory Methods 45 (2016), no. 7, 1989–2004. 10.1080/03610926.2013.870799Suche in Google Scholar
[21] T. Sato, Y. Takano, R. Miyashiro and A. Yoshise, Feature subset selection for logistic regression via mixed integer optimization, Comput. Optim. Appl. 64 (2016), no. 3, 865–880. 10.1007/s10589-016-9832-2Suche in Google Scholar
[22] E. Soofi, A generalizable formulation of conditional logit with diagnostics, J. Amer. Statist. Assoc. 87 (1992), 812–816. 10.1080/01621459.1992.10475283Suche in Google Scholar
[23] R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B 58 (1996), no. 1, 267–288. 10.1111/j.2517-6161.1996.tb02080.xSuche in Google Scholar
[24] J. Wu, L. Xue and P. Zhao, Quickly variable selection for varying coefficient models with missing response at random, Comm. Statist. Theory Methods 47 (2018), no. 10, 2327–2336. 10.1080/03610926.2014.922989Suche in Google Scholar
[25] I. C. Yeh and C. H. Lien, The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients, Exp. Syst. Appl. 36 (2009), no. 2, 2473–2480. 10.1016/j.eswa.2007.12.020Suche in Google Scholar
[26] J. Zhang, R. L. Rardin and J. R. Chimka, Budget constrained model selection for multiple linear regression, Comm. Statist. Simulation Comput. 52 (2023), no. 11, 5537–5549. 10.1080/03610918.2021.1991956Suche in Google Scholar
© 2025 Walter de Gruyter GmbH, Berlin/Boston