Impact of analytical bias on machine learning models for sepsis prediction using laboratory data

Meryem Rumeysa Yesil; Ilaria Talli; Michela Pelloso; Chiara Cosma; Elisa Pangrazzi; Mario Plebani; Yasemin Ustundag; Andrea Padoan

doi:10.1515/cclm-2025-0491

Article

Impact of analytical bias on machine learning models for sepsis prediction using laboratory data

Meryem Rumeysa Yesil , Ilaria Talli , Michela Pelloso , Chiara Cosma , Elisa Pangrazzi , Mario Plebani , Yasemin Ustundag and Andrea Padoan

Published/Copyright: May 28, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Clinical Chemistry and Laboratory Medicine (CCLM) Volume 63 Issue 10

Abstract

Objectives

Machine learning (ML) models, using laboratory data, support early sepsis prediction. However, analytical bias in laboratory measurements can compromise their performance and validity in real-world settings. We aimed to evaluate how analytically acceptable bias may affect the validity and generalizability of ML models trained on laboratory data.

Methods

A support vector machine model (SVM) for sepsis prediction was developed using complete blood count and erythrocyte sedimentation rate data from outpatients (CS, n=104) and patients from acute inflammatory status wards (SS, n=107). Twenty-six combinations were derived by white blood cells (WBC), platelets (PLT), and erythrocyte sedimentation rate (ESR) biases from analytical performance specifications (APS). The diagnostic performances of the 26 conditions tested were compared to the original dataset.

Results

SVM performance of the original dataset was AUC 90.6 % [95 %CI: 80.6–98.7 %]. Minimum, desirable and optimum acceptable biases for WBC were 7.7 , 5.1 and 2.6 %, respectively, for PLT were 6.7 , 4.5 and 2.2 %, respectively and for ESR were 31.6 , 21.1 and 10.5 %, respectively. Across all conditions, AUC varied from 89.8 % [95 %CI: 79.0–97.7 %] (for PLT bias −6.7 %), to 89.5 % [95 %CI: 79.1–98.0 %] (for ESR Bias +31.6 %) to 90.4 % [95 %CI: 79.3–98.4 %] (for WBC Bias −5.1 %). Using a combination of biases, the lowest AUC was 87.8 % [95 %CI: 75.9–96.6 %]. No statistically significant differences were observed for AUC (p>0.05).

Conclusions

Bias can influence model performance depending on the parameters and their combinations. Developing new validation strategies to assess the impact of analytical bias on laboratory data in ML models could improve their reliability.

Keywords: analytical bias; artificial intelligence; machine learning; model performance; sepsis

Corresponding author: Meryem Rumeysa Yesil, Department of Medical Biochemistry, University of Health Sciences, Bursa Yuksek Ihtisas Training and Research Hospital, Mimarsinan Mah. Emniyet Cad. 16310 Yıldırım, Bursa, Türkiye, E-mail: yesillmeryem@gmail.com

Acknowledgments

Dr. Meryem Rumeysa Yeşil gratefully acknowledges the valuable support provided by the IFCC Professional Scientific Exchange Programme (PSEP).

Research ethics: The study protocol was conducted in accordance with the principles of the Declaration of Helsinki. In this study, there was no direct intervention in the patient, and data were completely anonymous. All data handling was done in a way where the identity of the patient is unknown, even to the person performing the analysis, thus ensuring the patients’ security and privacy.
Informed consent: The need for informed consent was waived due to the study design being retrospective and utilizing only residual samples from the routine at University-Hospital of Padova, Italy (AOPD).
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Use of Large Language Models, AI and Machine Learning Tools: None declared.
Conflict of interest: The authors state no conflict of interest.
Research funding: None declared.
Data availability: The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

1. Singer, M, Deutschman, CS, Seymour, CW, Shankar-Hari, M, Annane, D, Bauer, M, et al.. The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA 2016;315:801–10. https://doi.org/10.1001/jama.2016.0287.Search in Google Scholar PubMed PubMed Central

2. Cecconi, M, Evans, L, Levy, M, Rhodes, A. Sepsis and septic shock. Lancet 2018;392:75–87. https://doi.org/10.1016/s0140-6736(18)30696-2.Search in Google Scholar

3. Seymour, CW, Gesten, F, Prescott, HC, Friedrich, ME, Iwashyna, TJ, Phillips, GS, et al.. Time to treatment and mortality during mandated emergency care for sepsis. N Engl J Med 2017;376:2235–44. https://doi.org/10.1056/nejmoa1703058.Search in Google Scholar PubMed PubMed Central

4. Fleischmann, C, Scherag, A, Adhikari, NKJ, Hartog, CS, Tsaganos, T, Schlattmann, P, et al.. Assessment of global incidence and mortality of hospital-treated sepsis. Current estimates and limitations. Am J Respir Crit Care Med 2016;193:259–72. https://doi.org/10.1164/rccm.201504-0781oc.Search in Google Scholar PubMed

5. Lambden, S, Laterre, PF, Levy, MM, Francois, B. The SOFA score-development, utility and challenges of accurate assessment in clinical trials. Crit Care 2019;23:374. https://doi.org/10.1186/s13054-019-2663-7.Search in Google Scholar PubMed PubMed Central

6. Desautels, T, Calvert, J, Hoffman, J, Jay, M, Kerem, Y, Shieh, L, et al.. Prediction of sepsis in the Intensive Care unit with minimal electronic health record data: a machine learning approach. JMIR Med Inf 2016;4:e28. https://doi.org/10.2196/medinform.5909.Search in Google Scholar PubMed PubMed Central

7. Nemati, S, Holder, A, Razmi, F, Stanley, MD, Clifford, GD, Buchman, TG. An interpretable machine learning model for accurate prediction of sepsis in the ICU. Crit Care Med 2018;46:547–53. https://doi.org/10.1097/ccm.0000000000002936.Search in Google Scholar PubMed PubMed Central

8. Lauritsen, SM, Kristensen, M, Olsen, MV, Larsen, MS, Lauritsen, KM, Jørgensen, MJ, et al.. Explainable artificial intelligence model to predict acute critical illness from electronic health records. Nat Commun 2020;11:3852. https://doi.org/10.1038/s41467-020-17431-x.Search in Google Scholar PubMed PubMed Central

9. Fleuren, LM, Klausch, TLT, Zwager, CL, Schoonmade, LJ, Guo, T, Roggeveen, LF, et al.. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med 2020;46:383–400. https://doi.org/10.1007/s00134-019-05872-y.Search in Google Scholar PubMed PubMed Central

10. Kwong, JCC, Nickel, GC, Wang, SCY, Kvedar, JC. Integrating artificial intelligence into healthcare systems: more than just the algorithm. NPJ Digit Med 2024;7:52. https://doi.org/10.1038/s41746-024-01066-z.Search in Google Scholar PubMed PubMed Central

11. Shashikumar, SP, Wardi, G, Malhotra, A, Nemati, S. Artificial intelligence sepsis prediction algorithm learns to say “I don’t know”. NPJ Digit Med 2021;4:134. https://doi.org/10.1038/s41746-021-00504-6.Search in Google Scholar PubMed PubMed Central

12. Wardi, G, Carlile, M, Holder, A, Shashikumar, S, Hayden, SR, Nemati, S. Predicting progression to septic shock in the emergency department using an externally generalizable machine-learning algorithm. Ann Emerg Med 2021;77:395–406. https://doi.org/10.1016/j.annemergmed.2020.11.007.Search in Google Scholar PubMed PubMed Central

13. Steinbach, D, Ahrens, PC, Schmidt, M, Federbusch, M, Heuft, L, Lübbert, C, et al.. Applying machine learning to blood count data predicts sepsis with ICU admission. Clin Chem 2024;70:506–15. https://doi.org/10.1093/clinchem/hvae001.Search in Google Scholar PubMed

14. Mao, Q, Jay, M, Hoffman, JL, Calvert, J, Barton, C, Shimabukuro, D, et al.. Multicentre validation of a sepsis prediction algorithm using only vital sign data in the emergency department, general ward and ICU. BMJ Open 2018;8:e017833. https://doi.org/10.1136/bmjopen-2017-017833.Search in Google Scholar PubMed PubMed Central

15. Campagner, A, Agnello, L, Carobene, A, Padoan, A, Del Ben, F, Locatelli, M, et al.. Complete blood count and monocyte distribution width-based machine learning algorithms for sepsis detection: multicentric development and external validation study. J Med Internet Res 2025;27:e55492. https://doi.org/10.2196/55492.Search in Google Scholar PubMed PubMed Central

16. Adams, R, Ji, Y, Wang, X, Saria, S. Learning models from data with measurement error: tackling underreporting. arXiv [stat.ML] 2019. Available from: http://arxiv.org/abs/1901.09060.Search in Google Scholar

17. Coskun, A. Bias in laboratory medicine: the dark side of the moon. Ann Lab Med 2024;44:6–20. https://doi.org/10.3343/alm.2024.44.1.6.Search in Google Scholar PubMed PubMed Central

18. Sandberg, S, Coskun, A, Carobene, A, Fernandez-Calle, P, Diaz-Garzon, J, Bartlett, WA, et al.. Analytical performance specifications based on biological variation data - considerations, strengths and limitations. Clin Chem Lab Med 2024;62:1483–9. https://doi.org/10.1515/cclm-2024-0108.Search in Google Scholar PubMed

19. Vokinger, KN, Feuerriegel, S, Kesselheim, AS. Mitigating bias in machine learning for medicine. Commun Med (Lond) 2021;1:25. https://doi.org/10.1038/s43856-021-00028-w.Search in Google Scholar PubMed PubMed Central

20. Yang, HS, Rhoads, DD, Sepulveda, J, Zang, C, Chadburn, A, Wang, F. Building the model. Arch Pathol Lab Med 2023;147:826–36. https://doi.org/10.5858/arpa.2021-0635-ra.Search in Google Scholar

21. Luu, HS. Laboratory data as a potential source of bias in healthcare artificial intelligence and machine learning models. Ann Lab Med 2025;45:12–21. https://doi.org/10.3343/alm.2024.0323.Search in Google Scholar PubMed PubMed Central

22. Chen, X, Hu, L, Yu, R. Development and external validation of machine learning-based models to predict patients with cellulitis developing sepsis during hospitalisation. BMJ Open 2024;14:e084183. https://doi.org/10.1136/bmjopen-2024-084183.Search in Google Scholar PubMed PubMed Central

23. Sun, W, Nasraoui, O, Shafto, P. Evolution and impact of bias in human and machine learning algorithm interaction. PLoS One 2020;15:e0235502. https://doi.org/10.1371/journal.pone.0235502.Search in Google Scholar PubMed PubMed Central

24. Li, F, Wu, P, Ong, HH, Peterson, JF, Wei, W-Q, Zhao, J. Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction. J Biomed Inform 2023;138:104294. https://doi.org/10.1016/j.jbi.2023.104294.Search in Google Scholar PubMed PubMed Central

25. Kursa, MB, Jankowski, A, Rudnicki, WR. Boruta – a system for feature selection. Fundam Inform 2010;101:271–85. https://doi.org/10.3233/fi-2010-288.Search in Google Scholar

26. Master, SR, Badrick, TC, Bietenbeck, A, Haymond, S. Machine learning in laboratory medicine: recommendations of the IFCC working group. Clin Chem 2023;69:690–8. https://doi.org/10.1093/clinchem/hvad055.Search in Google Scholar PubMed PubMed Central

27. Aarsand, AK, Fernandez-Calle, P, Webster, C, Coskun, A, Gonzales-Lao, E, Diaz-Garzon, J, et al.. EFLM biological variation. [Online]. https://biologicalvariation.eu/ [Accessed 11 Apr 2025].Search in Google Scholar

28. Penev, MN, Doukova-Peneva, P, Kalinov, K. Study on long-term biological variability of erythrocyte sedimentation rate. Scand J Clin Lab Invest 1996;56:285–8. https://doi.org/10.3109/00365519609088618.Search in Google Scholar PubMed

29. Harris, EK, Yasaka, T. On the calculation of a “reference change” for comparing two consecutive measurements. Clin Chem 1983;29:25–30. https://doi.org/10.1093/clinchem/29.1.25.Search in Google Scholar

30. DeLong, ER, DeLong, DM, Clarke-Pearson, DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837–45. https://doi.org/10.2307/2531595.Search in Google Scholar

31. Şener, B, Acici, K, Sümer, E. Categorization of Alzheimer’s disease stages using deep learning approaches with McNemar’s test. PeerJ Comput Sci 2024;10:e1877. https://doi.org/10.7717/peerj-cs.1877.Search in Google Scholar PubMed PubMed Central

32. Agnello, L, Vidali, M, Padoan, A, Lucis, R, Mancini, A, Guerranti, R, et al.. Machine learning algorithms in sepsis. Clin Chim Acta 2024;553:117738. https://doi.org/10.1016/j.cca.2023.117738.Search in Google Scholar PubMed

33. Padoan, A, Talli, I, Pelloso, M, Galla, L, Tosato, F, Diamanti, D, et al.. A machine learning approach for assessing acute infection by erythrocyte sedimentation rate (ESR) kinetics. Clin Chim Acta 2025;574:120308. https://doi.org/10.1016/j.cca.2025.120308.Search in Google Scholar PubMed

34. Cabitza, F, Campagner, A, Soares, F, García de Guadiana-Romualdo, L, Challa, F, Sulejmani, A, et al.. The importance of being external. methodological insights for the external validation of machine learning models in medicine. Comput Methods Progr Biomed 2021;208:106288. https://doi.org/10.1016/j.cmpb.2021.106288.Search in Google Scholar PubMed

35. Cross, JL, Choma, MA, Onofrey, JA. Bias in medical AI: implications for clinical decision-making. PLOS Digit Health 2024;3:e0000651. https://doi.org/10.1371/journal.pdig.0000651.Search in Google Scholar PubMed PubMed Central

36. Shah, M, Sureja, N. A comprehensive review of bias in deep learning models: Methods, impacts, and future directions. Arch Comput Methods Eng 2025;32:255–67. https://doi.org/10.1007/s11831-024-10134-2.Search in Google Scholar

37. Cabitza, F, Campagner, A. The need to separate the wheat from the chaff in medical informatics: introducing a comprehensive checklist for the (self)-assessment of medical AI studies. Int J Med Inf 2021;153:104510. https://doi.org/10.1016/j.ijmedinf.2021.104510.Search in Google Scholar PubMed

38. Sandberg, S, Fraser, CG, Horvath, AR, Jansen, R, Jones, G, Oosterhuis, W, et al.. Defining analytical performance specifications: consensus statement from the 1st strategic conference of the European federation of clinical Chemistry and laboratory medicine. Clin Chem Lab Med 2015;53:833–5. https://doi.org/10.1515/cclm-2015-0067.Search in Google Scholar PubMed

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/cclm-2025-0491).

Received: 2025-04-23

Accepted: 2025-05-08

Published Online: 2025-05-28

Published in Print: 2025-09-25

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.1515/cclm-2025-0491

Keywords for this article

analytical bias; artificial intelligence; machine learning; model performance; sepsis