Home Budget Constrained Model Selection for Logistic Regression
Article
Licensed
Unlicensed Requires Authentication

Budget Constrained Model Selection for Logistic Regression

  • Jingying Zhang , Ronald L. Rardin and Justin R. Chimka EMAIL logo
Published/Copyright: November 15, 2025
Become an author with De Gruyter Brill

Abstract

We propose a logistic regression model for selecting the best subset of explanatory variables under budget constraints, where the optimal subset may not always be affordable. Unlike traditional logistic regression models, which typically address model fitting and variable selection in separate steps, our proposed model integrates both steps into a single process. Computational studies on multiple datasets show that our methods are competitive and robust compared to traditional model selection techniques. Furthermore, they offer the practical advantage of selecting models subject to budget constraints on variables.

MSC 2020: 62J12; 62H12

References

[1] A. Agresti, An Introduction to Categorical Data Analysis, Wiley Ser. Probab. Stat., John Wiley & Sons, New York, 1996. Search in Google Scholar

[2] D. Bertsimas and A. King, OR forum—an algorithmic approach to linear regression, Oper. Res. 64 (2016), no. 1, 2–16. 10.1287/opre.2015.1436Search in Google Scholar

[3] D. Bertsimas, A. King and R. Mazumder, Best subset selection via a modern optimization lens, Ann. Statist. 44 (2016), no. 2, 813–852. 10.1214/15-AOS1388Search in Google Scholar

[4] Z. Bursac, C. H. Gauss, D. K. Williams and D. W. Hosmer, Purposeful selection of variables in logistic regression, Source Code Biol. Med. 3 (2008), Paper No. 17. 10.1186/1751-0473-3-17Search in Google Scholar PubMed PubMed Central

[5] M. A. Efroymson, Multiple regression analysis, Mathematical Methods for Digital Computers, Wiley, New York (1960), 191–203. Search in Google Scholar

[6] J. F. Frank, M. F. Sergio, T. R. Svetlozar and G. A. Bala, The Basics of Financial Econometrics: Tools, Concepts, and Asset Management Applications, John Wiley and Sons, New York, 2014. Search in Google Scholar

[7] N. Friedman, D. Geiger and M. Goldszmidt, Bayesian network classifiers, Mach. Learn. 29 (1997), 131–163. 10.1023/A:1007465528199Search in Google Scholar

[8] J. Friedman, T. Hastie and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent, J. Statist. Softw. 33 (2010), 1–22. 10.18637/jss.v033.i01Search in Google Scholar

[9] N. Friedman and D. Koller, Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks, Mach. Learn. 50 (2003), 95–125. 10.1023/A:1020249912095Search in Google Scholar

[10] L. A. Goodman, A Modified multiple regression approach to the analysis of dichotomous variables, Amer. Sociol. Rev. 37 (1972), 28–46. 10.2307/2093491Search in Google Scholar

[11] I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Gene selection for cancer classification using support vector machines, Mach. Learn. 46 (2002), 389–422. 10.1023/A:1012487302797Search in Google Scholar

[12] D. W. Hosmer, Jr., S. Lemeshow and R. X. Sturdivant, Applied logistic Regression, 3rd ed., Wiley, Hoboken 2013. 10.1002/9781118548387Search in Google Scholar

[13] G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning, Springer Texts Statist. 103, Springer, New York, 2013. 10.1007/978-1-4614-7138-7Search in Google Scholar

[14] S. Keerthi and E. Gilbert, Convergence of a generalized SMO algorithm for SVM classifier design, Mach. Learn. 46 (2002), 351–360. 10.1023/A:1012431217818Search in Google Scholar

[15] J. S. Long, Regression Models for Categorical and Limited Dependent Variables, Sage, Thousand Oaks, 1997. Search in Google Scholar

[16] A. Lucadamo and B. Simonetti, Variable selection in logistic regression, Adv. Appl. Stat. Sci. 6 (2011), 42–53. Search in Google Scholar

[17] S. Menard, Applied Logistic Regression Analysis, Sage, Thousand Oaks, 1995. Search in Google Scholar

[18] S. Menard, Six approaches to calculating standardized logistic regression coefficients, Amer. Statist. 58 (2004), no. 3, 218–223. 10.1198/000313004X946Search in Google Scholar

[19] Y. W. Park and D. Klabjan, Subset selection for multiple linear regression via optimization, J. Global Optim. 77 (2020), 543–574. 10.1007/s10898-020-00876-1Search in Google Scholar

[20] W. Rejchel, Lasso with convex loss: Model selection consistency and estimation, Comm. Statist. Theory Methods 45 (2016), no. 7, 1989–2004. 10.1080/03610926.2013.870799Search in Google Scholar

[21] T. Sato, Y. Takano, R. Miyashiro and A. Yoshise, Feature subset selection for logistic regression via mixed integer optimization, Comput. Optim. Appl. 64 (2016), no. 3, 865–880. 10.1007/s10589-016-9832-2Search in Google Scholar

[22] E. Soofi, A generalizable formulation of conditional logit with diagnostics, J. Amer. Statist. Assoc. 87 (1992), 812–816. 10.1080/01621459.1992.10475283Search in Google Scholar

[23] R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Statist. Soc. Ser. B 58 (1996), no. 1, 267–288. 10.1111/j.2517-6161.1996.tb02080.xSearch in Google Scholar

[24] J. Wu, L. Xue and P. Zhao, Quickly variable selection for varying coefficient models with missing response at random, Comm. Statist. Theory Methods 47 (2018), no. 10, 2327–2336. 10.1080/03610926.2014.922989Search in Google Scholar

[25] I. C. Yeh and C. H. Lien, The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients, Exp. Syst. Appl. 36 (2009), no. 2, 2473–2480. 10.1016/j.eswa.2007.12.020Search in Google Scholar

[26] J. Zhang, R. L. Rardin and J. R. Chimka, Budget constrained model selection for multiple linear regression, Comm. Statist. Simulation Comput. 52 (2023), no. 11, 5537–5549. 10.1080/03610918.2021.1991956Search in Google Scholar

Received: 2025-08-21
Accepted: 2025-10-14
Published Online: 2025-11-15

© 2025 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 23.11.2025 from https://www.degruyterbrill.com/document/doi/10.1515/eqc-2025-0043/html?lang=en
Scroll to top button