Abstract
Despite the growing criminal activities in South Africa, many victims still do not report the crimes, therefore there was a need to understand the determinants of the likelihood of reporting a crime in the country. Binary logistic regression is a supervised machine learning algorithm that can assist in predicting the likelihood of reporting a crime but the selection of relevant variables to add in the model varies from one author to the other. Selection of theoretically sound and statistically relevant independent variables is key to achieving parsimonious multivariate models. This study sought to test the efficiency of some commonly used variable selection methods for logistic regression models in order to identify the most relevant determinants of the likelihood of reporting a crime of housebreaking. The study used 17 candidate variables such as the victims’ demographic variables and their perceptions on the police. The multivariate model fitted using stepwise selection was found to be a best fit for the data based on the lowest AIC, the highest classification accuracy rate and the highest Area under the Receiver Operating Characteristic curve. The model fitted using the Hosmer-Lemeshow (H-L) algorithm was the worst fit for the data. The study revealed a limitation of the stepwise selection method which is that this method may select different independent variables for each unique set of randomly selected observations of the same dataset. The study established a multivariate logistic regression model to predict the likelihood of a victim reporting a crime of housebreaking and the determinants thereof.
Funding source: North West University
Award Identifier / Grant number: North West University Postgraduate Bursary
-
Funding: This paper forms part of research dissertation which was financially supported by the North West University (NWU), South Africa.
-
Availability of data and material: The data may be made available on request.
-
Code availability: The SAS code may be made available on request.
-
Conflicts of interest/Competing interests (include appropriate disclosures): There is neither conflict of interest nor competing interests.
Appendix 1: Descriptive Statistics for Age
Descriptive statistics | |||||
---|---|---|---|---|---|
n | Minimum | Maximum | Mean | Std. deviation | |
Age of persons in the household | 1061 | 14 | 103 | 46.80 | 15.037 |
Appendix 2: Frequency Table for Categorical Variables
Frequency | Percent | ||
---|---|---|---|
House breaking/burglary – reporting crime to the police | Yes | 583 | 54.9 |
No | 478 | 45.1 | |
Total | 1061 | 100.0 | |
Official contact with police | Yes | 455 | 42.9 |
No | 602 | 56.7 | |
Unspecified | 4 | 0.4 | |
Total | 1061 | 100.0 | |
Police response in emergency | Less than 30 min | 195 | 18.4 |
Less than 1 h (but more than 30 min) | 239 | 22.5 | |
Less than 2 h (but more than 1 h) | 186 | 17.5 | |
More than 2 h | 363 | 34.2 | |
Never arrive | 77 | 7.3 | |
Unspecified | 1 | 0.1 | |
Total | 1061 | 100.0 | |
Satisfaction with police | Yes | 442 | 41.7 |
No | 619 | 58.3 | |
Total | 1061 | 100.0 | |
Police officers on duty | At least once a day | 358 | 33.7 |
At least once a week | 268 | 25.3 | |
At least once a month | 133 | 12.5 | |
More than once a month | 70 | 6.6 | |
Never | 220 | 20.7 | |
Unspecified | 12 | 1.1 | |
Total | 1061 | 100.0 | |
Specialised police operations | Yes | 223 | 21.0 |
No | 836 | 78.8 | |
Unspecified | 2 | 0.2 | |
Total | 1061 | 100.0 | |
Children approach police officer | Yes | 943 | 88.9 |
No | 107 | 10.1 | |
Unspecified | 11 | 1.0 | |
Total | 1061 | 100.0 | |
Trust in the SAPS | Yes | 696 | 65.6 |
No | 361 | 34.0 | |
Unspecified | 4 | 0.4 | |
Total | 1061 | 100.0 | |
Trust in metro/traffic police | Yes | 725 | 68.3 |
No | 332 | 31.3 | |
Unspecified | 4 | 0.4 | |
Total | 1061 | 100.0 | |
Gender of persons in the household | Male | 654 | 61.6 |
Female | 407 | 38.4 | |
Total | 1061 | 100.0 | |
Population group of the persons in the household | Black African | 795 | 74.9 |
Coloured | 106 | 10.0 | |
Indian/Asian | 27 | 2.5 | |
White | 133 | 12.5 | |
Total | 1061 | 100.0 | |
Marital status of the persons in the household | Married | 386 | 36.4 |
Living together like husband and wife | 104 | 9.8 | |
Divorced | 57 | 5.4 | |
Separated, but still legally married | 14 | 1.3 | |
Widowed | 142 | 13.4 | |
Single, but have been living together with someone as husband | 17 | 1.6 | |
Single and have never been married/never lived together as h | 334 | 31.5 | |
Unspecified | 7 | 0.7 | |
Total | 1061 | 100.0 | |
Educational attainment of the persons in the household | Grade R/0 | 4 | 0.4 |
Grade 1/Sub A/Class 1 | 7 | 0.7 | |
Grade 2/Sub B/Class 2 | 17 | 1.6 | |
Grade 3/Standard 1/ABET 1 (Kha Ri Gude, Sanli) | 13 | 1.2 | |
Grade 4/Standard 2 | 24 | 2.3 | |
Grade 5/Standard 3/ABET 2 | 21 | 2.0 | |
Grade 6/Standard 4 | 45 | 4.2 | |
Grade 7/Standard 5/ABET 3 | 50 | 4.7 | |
Grade 8/Standard 6/Form 1 | 68 | 6.4 | |
Grade 9/Standard 7/Form 2/ABET 4 | 51 | 4.8 | |
Grade 10/Standard 8/Form 3 | 124 | 11.7 | |
Grade 11/Standard 9/Form 4 | 94 | 8.9 | |
Grade 12/Standard 10/Form 5/Matric (No Exemption) | 222 | 20.9 | |
Grade 12/Standard 10/Form 5/Matric (Exemption *) | 21 | 2.0 | |
NTC 1/N1/NC (V) level 2 | 2 | 0.2 | |
NTC 2/N2/NC (V) level 3 | 4 | 0.4 | |
NTC 3/N3/NC (V) level 4 | 7 | 0.7 | |
N4/NTC 4 | 6 | 0.6 | |
N5/NTC 5 | 3 | 0.3 | |
N6/NTC 6 | 5 | 0.5 | |
Certificate with less than Grade 12/Std 10 | 4 | 0.4 | |
Diploma with less than Grade 12/Std 10 | 7 | 0.7 | |
Certificate with Grade 12/Std 10 | 17 | 1.6 | |
Diploma with Grade 12/Std 10 | 72 | 6.8 | |
Higher Diploma (Technikon/University of Technology) | 15 | 1.4 | |
Post Higher Diploma (Technikon/University of Technology, Mas) | 3 | 0.3 | |
Bachelor?s degree | 41 | 3.9 | |
Bachelor?s degree and post-graduate diploma | 13 | 1.2 | |
Honours degree | 14 | 1.3 | |
Higher degree (Masters, Doctorate) | 14 | 1.3 | |
Other (specify in the box below) | 2 | 0.2 | |
Do not know | 10 | 0.9 | |
No schooling | 59 | 5.6 | |
Unspecified | 2 | 0.2 | |
Total | 1061 | 100.0 | |
Nearest police station | Yes | 1055 | 99.4 |
No | 5 | 0.5 | |
Unspecified | 1 | 0.1 | |
Total | 1061 | 100.0 | |
Average time to police station | Less than 30 min | 744 | 70.1 |
Less than 1 h (but more than 30 min) | 249 | 23.5 | |
Less than 2 h (but more than 1 h) | 47 | 4.4 | |
More than 2 h | 6 | 0.6 | |
Not applicable | 5 | 0.5 | |
Unspecified | 10 | 0.9 | |
Total | 1061 | 100.0 | |
Visited the police station in three years | Yes | 693 | 65.3 |
No | 350 | 33.0 | |
Not applicable | 5 | 0.5 | |
Unspecified | 13 | 1.2 | |
Total | 1061 | 100.0 |
References
Austin, P. C., and J. V. Tu. 2004. “Automated Variable Selection Methods for Logistic Regression Produced Unstable Models for Predicting Acute Myocardial Infarction Mortality.” Journal of Clinical Epidemiology 57 (11): 1138–46. https://doi.org/10.1016/j.jclinepi.2004.04.003.Search in Google Scholar
Bohlke, M., S. S. Marini, M. Rocha, L. Terhorst, R. H. Gomes, F. C. Barcellos, M. C. C. Irigoyen, and R. Sesso. 2009. “Factors Associated with Health-Related Quality of Life after Successful Kidney Transplantation: A Population-Based Study.” Quality of Life Research 18 (9): 1185–93. https://doi.org/10.1007/s11136-009-9536-5.Search in Google Scholar
Bursac, Z., C. H. Gauss, D. K. Williams, and D. W. Hosmer. 2008. “Purposeful Selection of Variables in Logistic Regression.” Source Code for Biology and Medicine 3 (1): 17. https://doi.org/10.1186/1751-0473-3-17.Search in Google Scholar
Chen, S.-C., Y.-T. Lee, C.-H. Yen, K.-C. Lai, L.-B. Jeng, D.-B. Lin, P.-H. Wang, C.-C. Chen, M.-C. Lee, and W.R. Bell. 2009. “Pyogenic Liver Abscess in the Elderly: Clinical Features, Outcomes and Prognostic Factors.” Age and Ageing 38 (3): 271–6. https://doi.org/10.1093/ageing/afp002.Search in Google Scholar
Fabozzi, F. J., S. M. Focardi, S. T. Rachev, B. G. Arshanapalli, and M. Hoechstoetter. 2014. The Basics of Financial Econometrics: Tools, Concepts, and Asset Management Applications. Hoboken, New Jersey: Wiley. https://doi.org/10.1002/9781118856406.Search in Google Scholar
Fanelli, M., E. Kupperman, E. Lautenbach, P. H. Edelstein, and D. J. Margolis. 2011. “Antibiotics, Acne, and Staphylococcus Aureus Colonization.” Archives of Dermatology 147 (8): 917–21. https://doi.org/10.1001/archdermatol.2011.67.Search in Google Scholar
Folkerson, L. E., D. Sloan, B. A. Cotton, J. B. Holcomb, J. S. Tomasek, and C. E. Wade. 2015. “Predicting Progressive Hemorrhagic Injury from Isolated Traumatic Brain Injury and Coagulation.” Surgery 158 (3): 655–61. https://doi.org/10.1016/j.surg.2015.02.029.Search in Google Scholar
Griffin, A. T., T. L. Wiemken, and F. W. Arnold. 2013. “Risk Factors for Cardiovascular Events in Hospitalized Patients with Community-Acquired Pneumonia.” International Journal of Infectious Diseases 17 (12): e1125–9. https://doi.org/10.1016/j.ijid.2013.07.005.Search in Google Scholar
Hacke, W., G. Donnan, C. Fieschi, M. Kaste, R. von Kummer, J. P. Broderick, The ATLANTIS, ECASS, and NINDS rt-PA Study Group Investigators. 2004. “Association of Outcome with Early Stroke Treatment: Pooled Analysis of ATLANTIS, ECASS, and NINDS Rt-PA Stroke Trials.” Lancet (London, England) 363 (9411): 768–74. https://doi.org/10.1016/s0140-6736(04)15692-4. Search in Google Scholar
Hajian-Tilaki, K. 2013. “Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation.” Caspian Journal of Internal Medicine 4 (2): 627. PMID: 24009950 and PMCID: PMC3755824.Search in Google Scholar
Heeringa, S. G., B. T. West, and P. A. Berglund. 2010. Applied Survey Data Analysis. Boca Raton, FL: CRC Press. https://doi.org/10.1201/9781420080674.Search in Google Scholar
Hilbe, J. M. 2011. “Logistic Regression.” In International Encyclopedia of Statistical Science, edited by M. Lovric, 755–8. Berlin, Heidelberg: Springer Berlin Heidelberg.10.1007/978-3-642-04898-2_344Search in Google Scholar
Hosmer, D., and S. J. N. Y. Lemeshow. 2000. Applied Logistic Regression, 2nd ed. New York: John Wiley & Sons. https://doi.org/10.1002/0471722146.Search in Google Scholar
Hosmer, D., and S. Lemeshow. 2004. Applied Logistic Regression, 2nd ed. Hoboken, NJ: John Wiley and Sons.10.1002/0470011815.b2a10030Search in Google Scholar
Hosmer David, W., and L. Stanley. 2000. Applied Logistic Regression. New York: Wiley. 0-471-61553-6.10.1002/0471722146Search in Google Scholar
Larkin, G. L., W. S. Copes, B. H. Nathanson, and W. Kaye. 2010. “Pre-Resuscitation Factors Associated with Mortality in 49,130 Cases of In-Hospital Cardiac Arrest: A Report from the National Registry for Cardiopulmonary Resuscitation.” Resuscitation 81 (3): 302–11. https://doi.org/10.1016/j.resuscitation.2009.11.021.Search in Google Scholar
Menard, S. 2002. Applied Logistic Regression Analysis. Thousand Oaks, CA: SAGE Publications. https://dx.doi.org/10.4135/9781412983433.10.4135/9781412983433Search in Google Scholar
Moisey, L. L., M. Mourtzakis, B. A. Cotton, T. Premji, D. K. Heyland, C. E. Wade, E. Bulger, R. A. Kozar, and Nutrition and Rehabilitation Investigators Consortium (NUTRIC). 2013. “Skeletal Muscle Predicts Ventilator-Free Days, ICU-Free Days, and Mortality in Elderly ICU Patients.” Critical Care 17 (5): R206. https://doi.org/10.1186/cc12901.Search in Google Scholar
Murtaugh, P. A. 2009. “Performance of Several Variable‐Selection Methods Applied to Real Ecological Data.” Ecology Letters 12 (10): 1061–8. https://doi.org/10.1111/j.1461-0248.2009.01361.x.Search in Google Scholar
North, R. A., L. M. McCowan, G. A. Dekker, L. Poston, E. H. Chan, A. W. Stewart, R. S. Taylor, P. N. J. B. Baker, and L. C. Kenny. 2011. “Clinical Risk Prediction for Pre-Eclampsia in Nulliparous Women: Development of Model in International Prospective Cohort.”BMJ 342: d1875. https://doi.org/10.1136/bmj.d1875.Search in Google Scholar
Olusegun, A. M., H. G. Dikko, and S. U. Gulumbe. 2015. “Identifying the Limitation of Stepwise Selection for Variable Selection in Regression Analysis.” American Journal of Theoretical and Applied Statistics 4 (5): 414–9. https://doi.org/10.11648/j.ajtas.20150405.22.Search in Google Scholar
Park, H. 2013. “An Introduction to Logistic Regression: From Basic Concepts to Interpretation with Particular Attention to Nursing Domain.” Journal of Korean Academy of Nursing 43 (2): 154–64. https://doi.org/10.4040/jkan.2013.43.2.154.Search in Google Scholar
Peng, C.-Y. J., T.-S. H. So, F. K. Stage, and E. P. S. John. 2002. The Use and Interpretation of Logistic Regression in Higher Education Journals: 1988–1999. Research in Higher Education 43(3): 259–93. https://doi.org/10.1023/A:1014858517172.10.1023/A:1014858517172Search in Google Scholar
Radwan, Z. A., Y. Bai, N. Matijevic, D. J. del Junco, J. J. McCarthy, C. E. Wade, J. B. Holcomb, and B. A. Cotton. 2013. “An Emergency Department Thawed Plasma Protocol for Severely Injured Patients.” JAMA Surgery 148 (2): 170–5. https://doi.org/10.1001/jamasurgery.2013.414.Search in Google Scholar
Randall, J. R., B. H. Rowe, K. A. Dong, M. K. Nock, and I. Colman. 2013. “Assessment of Self-Harm Risk Using Implicit Thoughts.” Psychological Assessment 25 (3): 714–21. https://doi.org/10.1037/a0032391.Search in Google Scholar
Sarkar, S., and H. Midi. 2010. “Importance of Assessing the Model Adequacy of Binary Logistic Regression.” Journal of Applied Sciences 10 (6): 479–86. https://doi.org/10.3923/jas.2010.479.486.Search in Google Scholar
Schlotzhauer, D. C. 1993. “Some Issues in Using PROC LOGISTIC for Binary Logistic Regression.” Observations: The Technical Journal for SAS Software Users 2 (4): 12 p.Search in Google Scholar
Shervin, A., D. del Junco, K. Sutter, T. A. McNearney, J. D. Reveille, A. Karnavas, P. Gourh, R. M. Estrada‐Y‐Martin, M. Fischbach, F. C. Arnett, and M. D. Mayes. 2009. “Clinical and Genetic Factors Predictive of Mortality in Early Systemic Sclerosis.” Arthritis Care & Research 61 (10): 1403–11. https://doi.org/10.1002/art.24734.Search in Google Scholar
© 2020 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Articles
- Healthcare Expenditure Prediction with Neighbourhood Variables – A Random Forest Model
- Southeast Asia after the Caliphate: Identifying Spatial Trends in Terrorism and Radicalization in Malaysia
- A Regression Model for Predicting the Likelihood of Reporting a Crime Based on the Victim’s Demographic Variables and Their Perceptions Towards the Police
- Authoritarianism, Prejudice, and Support for Welfare Chauvinism in the United States
Articles in the same Issue
- Frontmatter
- Articles
- Healthcare Expenditure Prediction with Neighbourhood Variables – A Random Forest Model
- Southeast Asia after the Caliphate: Identifying Spatial Trends in Terrorism and Radicalization in Malaysia
- A Regression Model for Predicting the Likelihood of Reporting a Crime Based on the Victim’s Demographic Variables and Their Perceptions Towards the Police
- Authoritarianism, Prejudice, and Support for Welfare Chauvinism in the United States