Penalized regression splines in Mixture Density Networks

Quentin Edward Seifert; Anton Thielmann; Elisabeth Bergherr; Benjamin Säfken; Jakob Zierk; Manfred Rauh; Tobias Hepp

doi:10.1515/ijb-2023-0134

Article

Penalized regression splines in Mixture Density Networks

Quentin Edward Seifert , Anton Thielmann , Elisabeth Bergherr , Benjamin Säfken , Jakob Zierk , Manfred Rauh and Tobias Hepp

Published/Copyright: June 5, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal The International Journal of Biostatistics Volume 21 Issue 1

Abstract

Mixture Density Networks (MDN) belong to a class of models that can be applied to data which cannot be sufficiently described by a single distribution since it originates from different components of the main unit and therefore needs to be described by a mixture of densities. In some situations, MDNs may have problems with the proper identification of the latent components. While these identification issues can to some extent be contained by using custom initialization strategies for the network weights, this solution is still less than ideal since it involves subjective opinions. We therefore suggest replacing the hidden layers between the model input and the output parameter vector of MDNs and estimating the respective distributional parameters with penalized cubic regression splines. Results on simulated data from both Gaussian and Gamma mixture distributions motivated by an application to indirect reference interval estimation drastically improved the identification performance with all splines reliably converging to their true parameter values.

Keywords: neural networks; distributional regression; finite mixture models; regression splines

Corresponding author: Quentin Edward Seifert, Chair of Spatial Data Science and Statistical Learning, University of Göttingen, Göttingen, Germany, E-mail: quentinedward.seifert@uni-goettingen.de

Funding source: Volkswagen Foundation

Award Identifier / Grant number: Bayesian Boosting – A new approach to data science

Funding source: Deutsche Forschungsgemeinschaft

Award Identifier / Grant number: 450330162

Award Identifier / Grant number: 517012999

Acknowledgments

Quentin E. Seifert performed the present work in partial fulfilment of the requirements for obtaining the degree “Dr. rer. pol.” at the Georg-August-Universität Göttingen.

Research ethics: Not applicable.
Informed consent: Not applicable.
Author contributions: The proposed model was conceptualized by EB, TH, BS and QS. QS wrote the manuscript with support from EB and AT, implemented the proposed model and performed the simulation study and data application. JZ and MR acquired and interpreted the data. All authors read and approved the manuscript.
Use of Large Language Models, AI and Machine Learning Tools: None declared.
Conflict of interest: The authors state no conflict of interest.
Research funding: Volkswagen Stiftung: Project “Bayesian Boosting - A new approach to data science, unifying two statistical philosophies” and Deutsche Forschungsgemeinschaft: Projects 450330162 and 517012999.
Data availability: R and Python scripts for the simulation study are available from the corresponding author upon request, hemoglobin data was provided by the PEDREF reference interval initiative (https://www.pedref.org/index.html) and available there upon reasonable request.

Appendix A

A.1 Simulation setups

The data from the mixutre of Gaussians is generated using

α 1 ( x ) = exp 0.75 + 0.75 ⁡ sin ( 1.5 + 1.5 π x ) 1 + exp 0.75 + 0.75 ⁡ sin ( 1.5 + 1.5 π x ) α 2 ( x ) = 1 − α 1 ( x ) μ 1 ( x ) = x + 10 ⁡ sin x − 0.5 12 π 2 μ 2 ( x ) = 15 + 16 x + μ 2 ( x ) = c + ( c + 1 ) x + 10 ⁡ sin x − 0.5 12 π 2 σ 1 ( x ) = 8 + 5 x σ 2 ( x ) = 11 + 9 x .

The Gamma setup is generated using

α 1 ( x ) = exp 0.75 + 0.75 ⁡ sin ( 1.5 + 1.5 π x ) 1 + exp 0.75 + 0.75 ⁡ sin ( 1.5 + 1.5 π x ) α 2 ( x ) = 1 − α 1 ( x ) μ 1 ( x ) = 5 + x − 2.5 ( x − 1 ) 2 μ 2 ( x ) = 9 + 2 x − 2.5 ( x − 1 ) 2 σ 1 ( x ) = exp − 1 + x − x 2 − 0.5 x 3 σ 2 ( x ) = exp − 1 + x − x 2 − 0.5 x 3 + 0.3 .

The data from the additive Gaussian example is generated using

α 1 ( x , v ) = exp 0.75 + 0.75 ⁡ sin ( 1.5 + 1.5 π x ) + v 2 − 0.5 v 1 + exp 0.75 + 0.75 ⁡ sin ( 1.5 + 1.5 π x ) + v 2 − 0.5 v α 2 ( x , v ) = 1 − α 1 ( x , v ) μ 1 ( x , v ) = x + 10 ⁡ sin x − 0.5 12 π 2 − 2.5 v + 5 ⁡ cos v 12 π 2 μ 2 ( x , v ) = 15 + 16 x + 10 ⁡ sin x − 0.5 12 π 2 + 1.4 − 2.5 v + 5 ⁡ cos v 12 π 2 σ 1 ( x , v ) = exp ( 0.9 x − 0.6 v ) σ 2 ( x , v ) = exp ( 1.2 x + 0.5 v ) .

References

1. Haeckel, R, Wosniok, W, Arzideh, F. A plea for intra-laboratory reference limits. Part 1. General considerations and concepts for determination. Clin Chem Lab Med 2007;45:1033–42. https://doi.org/10.1515/cclm.2007.249.Search in Google Scholar PubMed

2. Ceriotti, F. Establishing pediatric reference intervals: a challenging task. Clin Chem 2012;58:808–10. https://doi.org/10.1373/clinchem.2012.183483.Search in Google Scholar PubMed

3. Hepp, T, Zierk, J, Rauh, M, Metzler, M, Seitz, S. Mixture density networks for the indirect estimation of reference intervals. BMC Bioinf 2022;23. https://doi.org/10.1186/s12859-022-04846-0.Search in Google Scholar PubMed PubMed Central

4. Bishop, CM. Mixture density networks. Technical Report; 1994.Search in Google Scholar

5. Rigby, RA, Stasinopoulos, DM. Generalized additive models for location, scale and shape. J Roy Stat Soc: Ser C (Appl Stat) 2005;54:507–54. https://doi.org/10.1111/j.1467-9876.2005.00510.x.Search in Google Scholar

6. Grün, B, Leisch, F. Bootstrapping finite mixture models. In: Antoch, J, editor. COMPSTAT 2004 — proceedings in computational statistics. Heidelberg: Physica-Verlag HD; 2004:489–99 pp.Search in Google Scholar

7. Wedel, M, DeSarbo, WS. A mixture likelihood approach for generalized linear models. J Classif 1995;12:21–55. https://doi.org/10.1007/bf01202266.Search in Google Scholar

8. Hennig, C. Identifiablity of models for clusterwise linear regression. J Classif 2000;17. https://doi.org/10.1007/s003570000022.Search in Google Scholar

9. Agarwal, R, Melnick, L, Frosst, N, Zhang, X, Lengerich, B, Caruana, R, et al.. Neural additive models: interpretable machine learning with neural nets. Adv Neural Inf Process Syst 2021;34:4699–711.Search in Google Scholar

10. Thielmann, A, Kruse, R-M, Kneib, T, Säfken, B. Neural additive models for location scale and shape: a framework for interpretable neural regression beyond the mean. In: Proceedings of the 27th international conference on artificial intelligence and statistics (AISTATS). PMLR; 2024:163–74 pp.Search in Google Scholar

11. Rügamer, D, Kolb, C, Klein, N. Semi-structured distributional regression. Am Statistician 2024;78:88–99. https://doi.org/10.1080/00031305.2022.2164054.Search in Google Scholar

12. Huang, M, Li, R, Wang, S. Nonparametric mixture of regression models. J Am Stat Assoc 2013;108:929–41. https://doi.org/10.1080/01621459.2013.772897.Search in Google Scholar PubMed PubMed Central

13. Berrettini, M, Galimberti, G, Ranciati, S. Semiparametric finite mixture of regression models with bayesian p-splines. Adv Data Anal Classif 2023;17:745–75. https://doi.org/10.1007/s11634-022-00523-5.Search in Google Scholar

14. Hershey, S, Chaudhuri, S, Ellis, DP, Gemmeke, JF, Jansen, A, Moore, RC, et al.. Cnn architectures for large-scale audio classification. In: 2017 IEEE international conference on acoustics, speech and signal processing (icassp). IEEE; 2017:131–5 pp.10.1109/ICASSP.2017.7952132Search in Google Scholar

15. Krizhevsky, A, Sutskever, I, Hinton, GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 2012;25:1097–105.Search in Google Scholar

16. Qi, CR, Su, H, Mo, K, Guibas, LJ. Pointnet: deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017:652–60 pp.Search in Google Scholar

17. Goodfellow, I, Bengio, Y, Courville, A. Deep learning. Cambridge, MA: MIT press; 2016.Search in Google Scholar

18. Ramachandran, P, Zoph, B, Le, QV. Searching for activation functions. arXiv preprint arXiv:1710.05941; 2017.Search in Google Scholar

19. Shwartz-Ziv, R, Tishby, N. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810; 2017.Search in Google Scholar

20. Eilers, PH, Marx, BD. Flexible smoothing with b-splines and penalties. Stat Sci 1996;11:89–121. https://doi.org/10.1214/ss/1038425655.Search in Google Scholar

21. Wood, SN. Thin plate regression splines. J Roy Stat Soc B 2003;65:95–114. https://doi.org/10.1111/1467-9868.00374.Search in Google Scholar

22. Everitt, B, Hand, DJ. Finite mixture distributions. London: Springer Science & Business Media; 2013.10.1002/9781118445112.stat06216Search in Google Scholar

23. McLachlan, GJ, Lee, SX, Rathnayake, SI. Finite mixture models. Annu Rev Stat Appl 2019;6:355–78. https://doi.org/10.1146/annurev-statistics-031017-100325.Search in Google Scholar

24. Wood, SN. Generalized additive models: an introduction with R. New York: CRC Press; 2017.10.1201/9781315370279Search in Google Scholar

25. Fahrmeir, L, Kneib, T, Lang, S, Marx, B. Regression: models, methods and applications. Heidelberg: Springer Science & Business Media; 2013.10.1007/978-3-642-34333-9Search in Google Scholar

26. Kingma, DP, Ba, J. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980; 2014.Search in Google Scholar

27. Schall, R. Estimation in generalized linear models with random effects. Biometrika 1991;78:719–27. https://doi.org/10.2307/2336923.Search in Google Scholar

28. Schellhase, C, Kauermann, G. Density estimation and comparison with a penalized mixture approach. Comput Stat 2012;27:757–77. https://doi.org/10.1007/s00180-011-0289-6.Search in Google Scholar

29. Koslik, J-O. Efficient smoothness selection for nonparametric markov-switching models via quasi restricted maximum likelihood; 2024. https://arxiv.org/abs/2411.11498.Search in Google Scholar

30. Stasinopoulos, MD, Rigby, RA, Heller, GZ, Voudouris, V, De Bastiani, F. Flexible regression and smoothing: using GAMLSS in R. New York: CRC Press; 2017.10.1201/b21973Search in Google Scholar

31. Hepp, T, Zierk, J, Rauh, M, Metzler, M, Mayr, A. Latent class distributional regression for the estimation of non-linear reference limits from contaminated data sources. BMC Bioinf 2020;21:1–15. https://doi.org/10.1186/s12859-020-03853-3.Search in Google Scholar PubMed PubMed Central

32. Abadi, M, Agarwal, A, Barham, P, Brevdo, E, Chen, Z, Citro, C, et al.. TensorFlow: large-scale machine learning on heterogeneous systems. Software; 2015. Available from: tensorflow.org.Search in Google Scholar

33. PEDREF. Next-generation pediatric reference intervals; 2025. https://www.pedref.org/index.html [Accessed 27 Feb 2025].Search in Google Scholar

Received: 2023-11-17

Accepted: 2025-04-07

Published Online: 2025-06-05

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/ijb-2023-0134

Keywords for this article

neural networks; distributional regression; finite mixture models; regression splines