Home A Regression Model for Predicting the Likelihood of Reporting a Crime Based on the Victim’s Demographic Variables and Their Perceptions Towards the Police
Article
Licensed
Unlicensed Requires Authentication

A Regression Model for Predicting the Likelihood of Reporting a Crime Based on the Victim’s Demographic Variables and Their Perceptions Towards the Police

  • Malebogo Pulenyane ORCID logo and Tlhalitshi Volition Montshiwa ORCID logo EMAIL logo
Published/Copyright: October 9, 2020
Become an author with De Gruyter Brill

Abstract

Despite the growing criminal activities in South Africa, many victims still do not report the crimes, therefore there was a need to understand the determinants of the likelihood of reporting a crime in the country. Binary logistic regression is a supervised machine learning algorithm that can assist in predicting the likelihood of reporting a crime but the selection of relevant variables to add in the model varies from one author to the other. Selection of theoretically sound and statistically relevant independent variables is key to achieving parsimonious multivariate models. This study sought to test the efficiency of some commonly used variable selection methods for logistic regression models in order to identify the most relevant determinants of the likelihood of reporting a crime of housebreaking. The study used 17 candidate variables such as the victims’ demographic variables and their perceptions on the police. The multivariate model fitted using stepwise selection was found to be a best fit for the data based on the lowest AIC, the highest classification accuracy rate and the highest Area under the Receiver Operating Characteristic curve. The model fitted using the Hosmer-Lemeshow (H-L) algorithm was the worst fit for the data. The study revealed a limitation of the stepwise selection method which is that this method may select different independent variables for each unique set of randomly selected observations of the same dataset. The study established a multivariate logistic regression model to predict the likelihood of a victim reporting a crime of housebreaking and the determinants thereof.


Corresponding author: Tlhalitshi Volition Montshiwa, PhD, Department of Business Statistics & Operations Research, North West University, Mahikeng, South Africa, E-mail:

Funding source: North West University

Award Identifier / Grant number: North West University Postgraduate Bursary

  1. Funding: This paper forms part of research dissertation which was financially supported by the North West University (NWU), South Africa.

  2. Availability of data and material: The data may be made available on request.

  3. Code availability: The SAS code may be made available on request.

  4. Conflicts of interest/Competing interests (include appropriate disclosures): There is neither conflict of interest nor competing interests.

Appendices

Appendix 1: Descriptive Statistics for Age

Descriptive statistics
n Minimum Maximum Mean Std. deviation
Age of persons in the household 1061 14 103 46.80 15.037

Appendix 2: Frequency Table for Categorical Variables

Frequency Percent
House breaking/burglary – reporting crime to the police Yes 583 54.9
No 478 45.1
Total 1061 100.0
Official contact with police Yes 455 42.9
No 602 56.7
Unspecified 4 0.4
Total 1061 100.0
Police response in emergency Less than 30 min 195 18.4
Less than 1 h (but more than 30 min) 239 22.5
Less than 2 h (but more than 1 h) 186 17.5
More than 2 h 363 34.2
Never arrive 77 7.3
Unspecified 1 0.1
Total 1061 100.0
Satisfaction with police Yes 442 41.7
No 619 58.3
Total 1061 100.0
Police officers on duty At least once a day 358 33.7
At least once a week 268 25.3
At least once a month 133 12.5
More than once a month 70 6.6
Never 220 20.7
Unspecified 12 1.1
Total 1061 100.0
Specialised police operations Yes 223 21.0
No 836 78.8
Unspecified 2 0.2
Total 1061 100.0
Children approach police officer Yes 943 88.9
No 107 10.1
Unspecified 11 1.0
Total 1061 100.0
Trust in the SAPS Yes 696 65.6
No 361 34.0
Unspecified 4 0.4
Total 1061 100.0
Trust in metro/traffic police Yes 725 68.3
No 332 31.3
Unspecified 4 0.4
Total 1061 100.0
Gender of persons in the household Male 654 61.6
Female 407 38.4
Total 1061 100.0
Population group of the persons in the household Black African 795 74.9
Coloured 106 10.0
Indian/Asian 27 2.5
White 133 12.5
Total 1061 100.0
Marital status of the persons in the household Married 386 36.4
Living together like husband and wife 104 9.8
Divorced 57 5.4
Separated, but still legally married 14 1.3
Widowed 142 13.4
Single, but have been living together with someone as husband 17 1.6
Single and have never been married/never lived together as h 334 31.5
Unspecified 7 0.7
Total 1061 100.0
Educational attainment of the persons in the household Grade R/0 4 0.4
Grade 1/Sub A/Class 1 7 0.7
Grade 2/Sub B/Class 2 17 1.6
Grade 3/Standard 1/ABET 1 (Kha Ri Gude, Sanli) 13 1.2
Grade 4/Standard 2 24 2.3
Grade 5/Standard 3/ABET 2 21 2.0
Grade 6/Standard 4 45 4.2
Grade 7/Standard 5/ABET 3 50 4.7
Grade 8/Standard 6/Form 1 68 6.4
Grade 9/Standard 7/Form 2/ABET 4 51 4.8
Grade 10/Standard 8/Form 3 124 11.7
Grade 11/Standard 9/Form 4 94 8.9
Grade 12/Standard 10/Form 5/Matric (No Exemption) 222 20.9
Grade 12/Standard 10/Form 5/Matric (Exemption *) 21 2.0
NTC 1/N1/NC (V) level 2 2 0.2
NTC 2/N2/NC (V) level 3 4 0.4
NTC 3/N3/NC (V) level 4 7 0.7
N4/NTC 4 6 0.6
N5/NTC 5 3 0.3
N6/NTC 6 5 0.5
Certificate with less than Grade 12/Std 10 4 0.4
Diploma with less than Grade 12/Std 10 7 0.7
Certificate with Grade 12/Std 10 17 1.6
Diploma with Grade 12/Std 10 72 6.8
Higher Diploma (Technikon/University of Technology) 15 1.4
Post Higher Diploma (Technikon/University of Technology, Mas) 3 0.3
Bachelor?s degree 41 3.9
Bachelor?s degree and post-graduate diploma 13 1.2
Honours degree 14 1.3
Higher degree (Masters, Doctorate) 14 1.3
Other (specify in the box below) 2 0.2
Do not know 10 0.9
No schooling 59 5.6
Unspecified 2 0.2
Total 1061 100.0
Nearest police station Yes 1055 99.4
No 5 0.5
Unspecified 1 0.1
Total 1061 100.0
Average time to police station Less than 30 min 744 70.1
Less than 1 h (but more than 30 min) 249 23.5
Less than 2 h (but more than 1 h) 47 4.4
More than 2 h 6 0.6
Not applicable 5 0.5
Unspecified 10 0.9
Total 1061 100.0
Visited the police station in three years Yes 693 65.3
No 350 33.0
Not applicable 5 0.5
Unspecified 13 1.2
Total 1061 100.0

References

Austin, P. C., and J. V. Tu. 2004. “Automated Variable Selection Methods for Logistic Regression Produced Unstable Models for Predicting Acute Myocardial Infarction Mortality.” Journal of Clinical Epidemiology 57 (11): 1138–46. https://doi.org/10.1016/j.jclinepi.2004.04.003.Search in Google Scholar

Bohlke, M., S. S. Marini, M. Rocha, L. Terhorst, R. H. Gomes, F. C. Barcellos, M. C. C. Irigoyen, and R. Sesso. 2009. “Factors Associated with Health-Related Quality of Life after Successful Kidney Transplantation: A Population-Based Study.” Quality of Life Research 18 (9): 1185–93. https://doi.org/10.1007/s11136-009-9536-5.Search in Google Scholar

Bursac, Z., C. H. Gauss, D. K. Williams, and D. W. Hosmer. 2008. “Purposeful Selection of Variables in Logistic Regression.” Source Code for Biology and Medicine 3 (1): 17. https://doi.org/10.1186/1751-0473-3-17.Search in Google Scholar

Chen, S.-C., Y.-T. Lee, C.-H. Yen, K.-C. Lai, L.-B. Jeng, D.-B. Lin, P.-H. Wang, C.-C. Chen, M.-C. Lee, and W.R. Bell. 2009. “Pyogenic Liver Abscess in the Elderly: Clinical Features, Outcomes and Prognostic Factors.” Age and Ageing 38 (3): 271–6. https://doi.org/10.1093/ageing/afp002.Search in Google Scholar

Fabozzi, F. J., S. M. Focardi, S. T. Rachev, B. G. Arshanapalli, and M. Hoechstoetter. 2014. The Basics of Financial Econometrics: Tools, Concepts, and Asset Management Applications. Hoboken, New Jersey: Wiley. https://doi.org/10.1002/9781118856406.Search in Google Scholar

Fanelli, M., E. Kupperman, E. Lautenbach, P. H. Edelstein, and D. J. Margolis. 2011. “Antibiotics, Acne, and Staphylococcus Aureus Colonization.” Archives of Dermatology 147 (8): 917–21. https://doi.org/10.1001/archdermatol.2011.67.Search in Google Scholar

Folkerson, L. E., D. Sloan, B. A. Cotton, J. B. Holcomb, J. S. Tomasek, and C. E. Wade. 2015. “Predicting Progressive Hemorrhagic Injury from Isolated Traumatic Brain Injury and Coagulation.” Surgery 158 (3): 655–61. https://doi.org/10.1016/j.surg.2015.02.029.Search in Google Scholar

Griffin, A. T., T. L. Wiemken, and F. W. Arnold. 2013. “Risk Factors for Cardiovascular Events in Hospitalized Patients with Community-Acquired Pneumonia.” International Journal of Infectious Diseases 17 (12): e1125–9. https://doi.org/10.1016/j.ijid.2013.07.005.Search in Google Scholar

Hacke, W., G. Donnan, C. Fieschi, M. Kaste, R. von Kummer, J. P. Broderick, The ATLANTIS, ECASS, and NINDS rt-PA Study Group Investigators. 2004. “Association of Outcome with Early Stroke Treatment: Pooled Analysis of ATLANTIS, ECASS, and NINDS Rt-PA Stroke Trials.” Lancet (London, England) 363 (9411): 768–74. https://doi.org/10.1016/s0140-6736(04)15692-4. Search in Google Scholar

Hajian-Tilaki, K. 2013. “Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation.” Caspian Journal of Internal Medicine 4 (2): 627. PMID: 24009950 and PMCID: PMC3755824.Search in Google Scholar

Heeringa, S. G., B. T. West, and P. A. Berglund. 2010. Applied Survey Data Analysis. Boca Raton, FL: CRC Press. https://doi.org/10.1201/9781420080674.Search in Google Scholar

Hilbe, J. M. 2011. “Logistic Regression.” In International Encyclopedia of Statistical Science, edited by M. Lovric, 755–8. Berlin, Heidelberg: Springer Berlin Heidelberg.10.1007/978-3-642-04898-2_344Search in Google Scholar

Hosmer, D., and S. J. N. Y. Lemeshow. 2000. Applied Logistic Regression, 2nd ed. New York: John Wiley & Sons. https://doi.org/10.1002/0471722146.Search in Google Scholar

Hosmer, D., and S. Lemeshow. 2004. Applied Logistic Regression, 2nd ed. Hoboken, NJ: John Wiley and Sons.10.1002/0470011815.b2a10030Search in Google Scholar

Hosmer David, W., and L. Stanley. 2000. Applied Logistic Regression. New York: Wiley. 0-471-61553-6.10.1002/0471722146Search in Google Scholar

Larkin, G. L., W. S. Copes, B. H. Nathanson, and W. Kaye. 2010. “Pre-Resuscitation Factors Associated with Mortality in 49,130 Cases of In-Hospital Cardiac Arrest: A Report from the National Registry for Cardiopulmonary Resuscitation.” Resuscitation 81 (3): 302–11. https://doi.org/10.1016/j.resuscitation.2009.11.021.Search in Google Scholar

Menard, S. 2002. Applied Logistic Regression Analysis. Thousand Oaks, CA: SAGE Publications. https://dx.doi.org/10.4135/9781412983433.10.4135/9781412983433Search in Google Scholar

Moisey, L. L., M. Mourtzakis, B. A. Cotton, T. Premji, D. K. Heyland, C. E. Wade, E. Bulger, R. A. Kozar, and Nutrition and Rehabilitation Investigators Consortium (NUTRIC). 2013. “Skeletal Muscle Predicts Ventilator-Free Days, ICU-Free Days, and Mortality in Elderly ICU Patients.” Critical Care 17 (5): R206. https://doi.org/10.1186/cc12901.Search in Google Scholar

Murtaugh, P. A. 2009. “Performance of Several Variable‐Selection Methods Applied to Real Ecological Data.” Ecology Letters 12 (10): 1061–8. https://doi.org/10.1111/j.1461-0248.2009.01361.x.Search in Google Scholar

North, R. A., L. M. McCowan, G. A. Dekker, L. Poston, E. H. Chan, A. W. Stewart, R. S. Taylor, P. N. J. B. Baker, and L. C. Kenny. 2011. “Clinical Risk Prediction for Pre-Eclampsia in Nulliparous Women: Development of Model in International Prospective Cohort.”BMJ 342: d1875. https://doi.org/10.1136/bmj.d1875.Search in Google Scholar

Olusegun, A. M., H. G. Dikko, and S. U. Gulumbe. 2015. “Identifying the Limitation of Stepwise Selection for Variable Selection in Regression Analysis.” American Journal of Theoretical and Applied Statistics 4 (5): 414–9. https://doi.org/10.11648/j.ajtas.20150405.22.Search in Google Scholar

Park, H. 2013. “An Introduction to Logistic Regression: From Basic Concepts to Interpretation with Particular Attention to Nursing Domain.” Journal of Korean Academy of Nursing 43 (2): 154–64. https://doi.org/10.4040/jkan.2013.43.2.154.Search in Google Scholar

Peng, C.-Y. J., T.-S. H. So, F. K. Stage, and E. P. S. John. 2002. The Use and Interpretation of Logistic Regression in Higher Education Journals: 1988–1999. Research in Higher Education 43(3): 259–93. https://doi.org/10.1023/A:1014858517172.10.1023/A:1014858517172Search in Google Scholar

Radwan, Z. A., Y. Bai, N. Matijevic, D. J. del Junco, J. J. McCarthy, C. E. Wade, J. B. Holcomb, and B. A. Cotton. 2013. “An Emergency Department Thawed Plasma Protocol for Severely Injured Patients.” JAMA Surgery 148 (2): 170–5. https://doi.org/10.1001/jamasurgery.2013.414.Search in Google Scholar

Randall, J. R., B. H. Rowe, K. A. Dong, M. K. Nock, and I. Colman. 2013. “Assessment of Self-Harm Risk Using Implicit Thoughts.” Psychological Assessment 25 (3): 714–21. https://doi.org/10.1037/a0032391.Search in Google Scholar

Sarkar, S., and H. Midi. 2010. “Importance of Assessing the Model Adequacy of Binary Logistic Regression.” Journal of Applied Sciences 10 (6): 479–86. https://doi.org/10.3923/jas.2010.479.486.Search in Google Scholar

Schlotzhauer, D. C. 1993. “Some Issues in Using PROC LOGISTIC for Binary Logistic Regression.” Observations: The Technical Journal for SAS Software Users 2 (4): 12 p.Search in Google Scholar

Shervin, A., D. del Junco, K. Sutter, T. A. McNearney, J. D. Reveille, A. Karnavas, P. Gourh, R. M. Estrada‐Y‐Martin, M. Fischbach, F. C. Arnett, and M. D. Mayes. 2009. “Clinical and Genetic Factors Predictive of Mortality in Early Systemic Sclerosis.” Arthritis Care & Research 61 (10): 1403–11. https://doi.org/10.1002/art.24734.Search in Google Scholar

Received: 2020-03-16
Accepted: 2020-09-24
Published Online: 2020-10-09
Published in Print: 2020-12-16

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 13.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/spp-2020-0003/html
Scroll to top button