Home Business & Economics Gaussian Mixture Regression Model with Sparsity for Clustering of Territory Risk in Auto Insurance
Article
Licensed
Unlicensed Requires Authentication

Gaussian Mixture Regression Model with Sparsity for Clustering of Territory Risk in Auto Insurance

  • Shengkun Xie ORCID logo EMAIL logo , Chong Gan and Anna T. Lawniczak
Published/Copyright: October 9, 2024

Abstract

Insurance rating territory design and accurate estimation of territory risk relativities are fundamental aspects of auto insurance rate regulation. It is crucial to develop methodologies that can facilitate the effective design of rating territories and their risk relativities estimate, as they directly impact the rate filing and the decision support of the rate change review process. This article proposes a Gaussian Mixture Regression model clustering approach for territory design. The proposed method incorporates a linear regression model, taking spatial location as model covariates, which helps estimate the cluster mean more accurately. Also, to further enhance the estimation of territory risk relativities, we impose sparsity through sparse matrix decomposition of the membership coefficient matrix obtained from the Gaussian Mixture Regression model. By transitioning from the current hard clustering method to a soft approach, our methodology could improve the evaluation of territory risk for rate-making purposes. Moreover, using non-negative sparse matrix approximation ensures that the estimation of risk relativities for basic rating units remains smooth, effectively eliminating data noise from the territory risk relativity estimate. Overall, our novel methodology aims to significantly enhance the accuracy and reliability of risk analysis in auto insurance. Furthermore, the proposed method exhibits potential for extension to various other domains that involve spatial clustering of data, thereby broadening its applicability and expanding its usefulness beyond auto insurance rate regulation.


Corresponding author: Shengkun Xie, Ted Rogers School of Managment, Toronto Metropolitan University, Global Management Studies, Toronto, Canada, E-mail: 

Appendix
Table 12:

The first 40 rows of the FSA data, sorted in descending order by ‘exposures’. This data is given as an example of input data for Gaussian mixture regression clustering.

FSA Loss cost Exposures Latitude Longitude
L5N 1,519 46,127 43.58701 −79.75656
L5M 2,070 45,842 43.56732 −79.71608
L4C 1,801 36,837 43.87009 −79.43920
L4J 2,110 36,284 43.81237 −79.44938
L3R 1,591 34,114 43.84944 −79.32583
L6A 2,423 33,640 43.85930 −79.51554
L4L 1,678 33,347 43.79339 −79.57974
L6Y 2,749 30,515 43.65887 −79.75180
L6R 3,667 30,459 43.75530 −79.75334
L7A 3,008 28,507 43.70264 −79.82300
L6S 2,445 27,688 43.73340 −79.73314
L3T 1,671 26,799 43.82198 −79.39453
M2N 1,737 26,124 43.76766 −79.40879
L4H 2,473 25,288 43.82593 −79.58696
L3S 3,187 24,838 43.84275 −79.27094
L5L 1,498 24,822 43.53743 −79.69186
L5B 2,272 24,542 43.57746 −79.63006
M1B 4,082 24,415 43.80588 −79.20751
M5C 141 23,987 43.65710 −79.38253
L6X 2,457 23,027 43.68150 −79.78490
L3P 980 22,967 43.88057 −79.26391
L5V 2,428 22,696 43.59532 −79.69070
L6P 3,738 22,376 43.78321 −79.70193
M1V 2,662 21,966 43.81752 −79.28170
L4E 1,675 21,150 43.94182 −79.45497
L5A 2,390 20,852 43.58620 −79.61031
M2J 2,092 20,369 43.77962 −79.34920
M1W 2,140 19,943 43.79816 −79.32107
L6C 1,332 19,729 43.89046 −79.33551
L4B 1,608 19,701 43.85529 −79.40083
L6V 3,110 19,248 43.70306 −79.76144
M9C 1,360 18,714 43.64486 −79.57317
L4Z 2,046 18,168 43.61335 −79.64676
M1C 1,242 18,057 43.78708 −79.15529
L5R 2,546 17,854 43.60404 −79.66888
M1E 2,568 17,669 43.76557 −79.19081
M9W 3,123 17,441 43.71797 −79.58034
L6T 2,677 17,042 43.71664 −79.70027
L6Z 1,770 16,999 43.72642 −79.79354
L4S 1,677 16,638 43.89404 −79.42243

References

Adams, M. D., P. S. Kanaroglou, and P. Coulibaly. 2016. “Spatially Constrained Clustering of Ecological Units to Facilitate the Design of Integrated Water Monitoring Networks in the St. Lawrence Basin.” International Journal of Geographical Information Science 30 (2): 390–404. https://doi.org/10.1080/13658816.2015.1089442.Search in Google Scholar

An, P., Z. Wang, and C. Zhang. 2022. “Ensemble Unsupervised Autoencoders and Gaussian Mixture Model for Cyberattack Detection.” Information Processing & Management 59 (2): 102844. https://doi.org/10.1016/j.ipm.2021.102844.Search in Google Scholar

Aonishi, T., R. Maruyama, T. Ito, H. Miyakawa, M. Murayama, and K. Ota. 2022. “Imaging Data Analysis Using Non-Negative Matrix Factorization.” Neuroscience Research 179: 51–6. https://doi.org/10.1016/j.neures.2021.12.001.Search in Google Scholar

Asteris, M., D. Papailiopoulos, and A. Dimakis. 2014. “Nonnegative Sparse PCA with Provable Guarantees.” In International Conference on Machine Learning, 1728–36. PMLR.Search in Google Scholar

Blier-Wong, C., H. Cossette, L. Lamontagne, and E. Marceau. 2022. “Geographic Ratemaking with Spatial Embeddings.” ASTIN Bulletin: The Journal of the IAA 52 (1): 1–31. https://doi.org/10.1017/asb.2021.25.Search in Google Scholar

Boudreault, M., and A. Ojeda. 2022. “Ratemaking Territories and Adverse Selection for Flood Insurance.” Insurance: Mathematics and Economics 107: 349–60. https://doi.org/10.1016/j.insmatheco.2022.09.005.Search in Google Scholar

Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society: Series B 39 (1): 1–22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x.Search in Google Scholar

Duong, T. D. 2009. “Non-Negative Sparse Principal Component Analysis.” Journal of Technical Education Science 4 (11): 31–8.Search in Google Scholar

Gogebakan, M. 2021. “A Novel Approach for Gaussian Mixture Model Clustering Based on Soft Computing Method.” IEEE Access 9: 159987–60003. https://doi.org/10.1109/access.2021.3130066.Search in Google Scholar

Jandaghi, G., and Z. Moradpour. 2015. “Segmentation of Life Insurance Customers Based on Their Profile Using Fuzzy Clustering.” International Letters of Social and Humanistic Sciences 61: 17–24. https://doi.org/10.18052/www.scipress.com/ilshs.61.17.Search in Google Scholar

Jandaghi, G., H. Moazzez, and Z. Moradpour. 2015. “Life Insurance Customers Segmentation Using Fuzzy Clustering.” World Scientific News 21: 24–35.Search in Google Scholar

Jia, W., Y. Tan, L. Liu, J. Li, H. Zhang, and K. Zhao. 2019. “Hierarchical Prediction Based on Two-Level Gaussian Mixture Model Clustering for Bike-Sharing System.” Knowledge-Based Systems 178: 84–97. https://doi.org/10.1016/j.knosys.2019.04.020.Search in Google Scholar

Kang, Y., K. Wu, S. Gao, I. Ng, J. Rao, S. Ye, F. Zhang, and T. Fei. 2022. “Sticc: A Multivariate Spatial Clustering Method for Repeated Geographic Pattern Discovery with Consideration of Spatial Contiguity.” International Journal of Geographical Information Science 36 (8): 1518–49. https://doi.org/10.1080/13658816.2022.2053980.Search in Google Scholar

Kaushik, M., and B. Mathur. 2014. “Comparative Study of K-Means and Hierarchical Clustering Techniques.” International Journal of Software & Hardware Research in Engineering 2 (6): 93–8.Search in Google Scholar

Lee, D., and H. S. Seung. 2000. “Algorithms for Non-Negative Matrix Factorization.” Advances in Neural Information Processing Systems 13.Search in Google Scholar

Lin, X., and P. C. Boutros. 2020. “Optimization and Expansion of Non-Negative Matrix Factorization.” BMC Bioinformatics 21 (1): 1–10. https://doi.org/10.1186/s12859-019-3312-5.Search in Google Scholar

Liu, Y., L. Ye, H. Qin, S. Ouyang, Z. Zhang, and J. Zhou. 2019. “Middle and Long-Term Runoff Probabilistic Forecasting Based on Gaussian Mixture Regression.” Water Resources Management 33: 1785–99. https://doi.org/10.1007/s11269-019-02221-y.Search in Google Scholar

Majhi, S. K., S. Bhatachharya, R. Pradhan, and S. Biswal. 2019. “Fuzzy Clustering Using SALP Swarm Algorithm for Automobile Insurance Fraud Detection.” Journal of Intelligent and Fuzzy Systems 36 (3): 2333–44. https://doi.org/10.3233/jifs-169944.Search in Google Scholar

McLachlan, G. J., and S. Rathnayake. 2014. “On the Number of Components in a Gaussian Mixture Model.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4 (5): 341–55. https://doi.org/10.1002/widm.1135.Search in Google Scholar

Nicholson, D., O. A. Vanli, S. Jung, and E. E. Ozguven. 2019. “A Spatial Regression and Clustering Method for Developing Place-Specific Social Vulnerability Indices Using Census and Social Media Data.” International Journal of Disaster Risk Reduction 38: 101224. https://doi.org/10.1016/j.ijdrr.2019.101224.Search in Google Scholar

Piesio, M., M. Ganzha, and M. Paprzycki. 2020. “Applying Machine Learning to Anomaly Detection in Car Insurance Sales.” In Big Data Analytics: 8th International Conference, BDA 2020, Sonepat, India, December 15–18, 2020, Proceedings 8, 257–77. Springer.10.1007/978-3-030-66665-1_17Search in Google Scholar

Reynolds, D. A. 2009. “Gaussian Mixture Models.” Encyclopedia of Biometrics 741: 659–63. https://doi.org/10.1007/978-0-387-73003-5_196.Search in Google Scholar

Ruugia, S., and C. Moturi. 2017. “Application of GIS Spatial Interpolation Methods in Auto Insurance Risk Territory Segmentation and Rating.” ORSEA Journal 4 (1).Search in Google Scholar

Shapiro, A. F. 2004. “Fuzzy Logic in Insurance.” Insurance: Mathematics and Economics 35 (2): 399–424. https://doi.org/10.1016/j.insmatheco.2004.07.010.Search in Google Scholar

Shen, X., Y. Zhang, K. Sata, and T. Shen. 2020. “Gaussian Mixture Model Clustering-Based Knock Threshold Learning in Automotive Engines.” IEEE 25 (6): 2981–91. https://doi.org/10.1109/tmech.2020.3000732.Search in Google Scholar

Shi, P., and K. Shi. 2017. “Territorial Risk Classification Using Spatially Dependent Frequency-Severity Models.” ASTIN Bulletin: The Journal of the IAA 47 (2): 437–65. https://doi.org/10.1017/asb.2017.7.Search in Google Scholar

Shi, P., and K. Shi. 2022. “Non-Life Insurance Risk Classification Using Categorical Embedding.” North American Actuarial Journal: 1–23. https://doi.org/10.1080/10920277.2022.2123361.Search in Google Scholar

Shi, P., and K. Shi. 2023. “Non-Life Insurance Risk Classification Using Categorical Embedding.” North American Actuarial Journal 27 (3): 579–601. https://doi.org/10.1080/10920277.2022.2123361.Search in Google Scholar

Shimizu, N., and H. Kaneko. 2020. “Direct Inverse Analysis Based on Gaussian Mixture Regression for Multiple Objective Variables in Material Design.” Materials & Design 196: 109168. https://doi.org/10.1016/j.matdes.2020.109168.Search in Google Scholar

Sung, H. G. 2004. Gaussian Mixture Regression and Classification. Houston: Rice University.Search in Google Scholar

Teodorescu, S. 2009. “Loss Distributions Modeling for Motor TPL Insurance Class Using Gaussian Mixture Method and EM Algorithm.” Communications of the IBIMA 10: 151–7.Search in Google Scholar

Tsai, C. C.-L., and E. S. Cheng. 2021. “Incorporating Statistical Clustering Methods into Mortality Models to Improve Forecasting Performances.” Insurance: Mathematics and Economics 99: 42–62. https://doi.org/10.1016/j.insmatheco.2021.03.005.Search in Google Scholar

Wang, G., L. Qian, and Z. Guo. 2013. “Continuous Tool Wear Prediction Based on Gaussian Mixture Regression Model.” The International Journal of Advanced Manufacturing Technology 66: 1921–9. https://doi.org/10.1007/s00170-012-4470-z.Search in Google Scholar

Wang, Z., J. Wu, L. Cheng, K. Liu, and Y.-M. Wei. 2018. “Regional Flood Risk Assessment via Coupled Fuzzy C-Means Clustering Methods: An Empirical Analysis from China’s Huaihe River Basin.” Natural Hazards 93: 803–22. https://doi.org/10.1007/s11069-018-3325-9.Search in Google Scholar

Wang, F., F. Liao, Y. Li, and H. Wang. 2021. “A New Prediction Strategy for Dynamic Multi-Objective Optimization Using Gaussian Mixture Model.” Information Sciences 580: 331–51. https://doi.org/10.1016/j.ins.2021.08.065.Search in Google Scholar

Wang, J., T. Li, B. Li, and M. Q.-H. Meng. 2022. “GMR-RRT*: Sampling-Based Path Planning Using Gaussian Mixture Regression.” IEEE Transactions on Intelligent Vehicles 7 (3): 690–700. https://doi.org/10.1109/tiv.2022.3150748.Search in Google Scholar

Wei, M., M. Ye, Q. Wang, and J. P. Twajamahoro. 2022. “Remaining Useful Life Prediction of Lithium-Ion Batteries Based on Stacked Autoencoder and Gaussian Mixture Regression.” Journal of Energy Storage 47: 103558. https://doi.org/10.1016/j.est.2021.103558.Search in Google Scholar

Weibel, E. J., and J. P. Walsh. 2008. “Territory Analysis with Mixed Models and Clustering.” Applying Multivariate Statistical Models: 91.Search in Google Scholar

Wu, D., P. Yan, Y. Guo, H. Zhou, and J. Chen. 2022. “A Gear Machining Error Prediction Method Based on Adaptive Gaussian Mixture Regression Considering Stochastic Disturbance.” Journal of Intelligent Manufacturing 33 (8): 2321–39. https://doi.org/10.1007/s10845-021-01791-2.Search in Google Scholar

Xie, S. 2019. “Defining Geographical Rating Territories in Auto Insurance Regulation by Spatially Constrained Clustering.” Risks 7 (2): 42. https://doi.org/10.3390/risks7020042.Search in Google Scholar

Xie, S., and C. Gan. 2022. “Fuzzy Clustering and Non-Negative Sparse Matrix Approximation on Estimating Territory Risk Relativities.” In 2022 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 1–8. IEEE.10.1109/FUZZ-IEEE55066.2022.9882750Search in Google Scholar

Xie, S., and C. Gan. 2023. “Estimating Territory Risk Relativity Using Generalized Linear Mixed Models and Fuzzy C-Means Clustering.” Risks 11 (6): 99. https://doi.org/10.3390/risks11060099.Search in Google Scholar

Xie, S., A. T. Lawniczak, and Z. Wang. 2017. “Spatially Constrained Clustering to Define Geographical Rating Territories.” In ICPRAM, 82–8.10.5220/0006118100820088Search in Google Scholar

Yazdi, A. K., Y. J. Wang, and A. Alirezaei. 2018. “Analytical Insights into Firm Performance: A Fuzzy Clustering Approach for Data Envelopment Analysis Classification.” International Journal of Operational Research 33 (3): 413–29. https://doi.org/10.1504/ijor.2018.095630.Search in Google Scholar

Zhang, Y., M. Li, S. Wang, S. Dai, L. Luo, E. Zhu, H. Xu, X. Zhu, C. Yao, and H. Zhou. 2021. “Gaussian Mixture Model Clustering with Incomplete Data.” ACM Transactions on Multimedia Computing, Communications, and Applications 17 (1s): 1–14. https://doi.org/10.1145/3408318.Search in Google Scholar

Received: 2024-01-27
Accepted: 2024-09-24
Published Online: 2024-10-09

© 2024 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 12.12.2025 from https://www.degruyterbrill.com/document/doi/10.1515/apjri-2024-0002/html
Scroll to top button