Home Impact of analytical bias on machine learning models for sepsis prediction using laboratory data
Article
Licensed
Unlicensed Requires Authentication

Impact of analytical bias on machine learning models for sepsis prediction using laboratory data

  • Meryem Rumeysa Yesil ORCID logo EMAIL logo , Ilaria Talli ORCID logo , Michela Pelloso ORCID logo , Chiara Cosma ORCID logo , Elisa Pangrazzi ORCID logo , Mario Plebani ORCID logo , Yasemin Ustundag ORCID logo and Andrea Padoan ORCID logo
Published/Copyright: May 28, 2025

Abstract

Objectives

Machine learning (ML) models, using laboratory data, support early sepsis prediction. However, analytical bias in laboratory measurements can compromise their performance and validity in real-world settings. We aimed to evaluate how analytically acceptable bias may affect the validity and generalizability of ML models trained on laboratory data.

Methods

A support vector machine model (SVM) for sepsis prediction was developed using complete blood count and erythrocyte sedimentation rate data from outpatients (CS, n=104) and patients from acute inflammatory status wards (SS, n=107). Twenty-six combinations were derived by white blood cells (WBC), platelets (PLT), and erythrocyte sedimentation rate (ESR) biases from analytical performance specifications (APS). The diagnostic performances of the 26 conditions tested were compared to the original dataset.

Results

SVM performance of the original dataset was AUC 90.6 % [95 %CI: 80.6–98.7 %]. Minimum, desirable and optimum acceptable biases for WBC were 7.7 , 5.1 and 2.6 %, respectively, for PLT were 6.7 , 4.5 and 2.2 %, respectively and for ESR were 31.6 , 21.1 and 10.5 %, respectively. Across all conditions, AUC varied from 89.8 % [95 %CI: 79.0–97.7 %] (for PLT bias −6.7 %), to 89.5 % [95 %CI: 79.1–98.0 %] (for ESR Bias +31.6 %) to 90.4 % [95 %CI: 79.3–98.4 %] (for WBC Bias −5.1 %). Using a combination of biases, the lowest AUC was 87.8 % [95 %CI: 75.9–96.6 %]. No statistically significant differences were observed for AUC (p>0.05).

Conclusions

Bias can influence model performance depending on the parameters and their combinations. Developing new validation strategies to assess the impact of analytical bias on laboratory data in ML models could improve their reliability.


Corresponding author: Meryem Rumeysa Yesil, Department of Medical Biochemistry, University of Health Sciences, Bursa Yuksek Ihtisas Training and Research Hospital, Mimarsinan Mah. Emniyet Cad. 16310 Yıldırım, Bursa, Türkiye, E-mail:

Acknowledgments

Dr. Meryem Rumeysa Yeşil gratefully acknowledges the valuable support provided by the IFCC Professional Scientific Exchange Programme (PSEP).

  1. Research ethics: The study protocol was conducted in accordance with the principles of the Declaration of Helsinki. In this study, there was no direct intervention in the patient, and data were completely anonymous. All data handling was done in a way where the identity of the patient is unknown, even to the person performing the analysis, thus ensuring the patients’ security and privacy.

  2. Informed consent: The need for informed consent was waived due to the study design being retrospective and utilizing only residual samples from the routine at University-Hospital of Padova, Italy (AOPD).

  3. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  4. Use of Large Language Models, AI and Machine Learning Tools: None declared.

  5. Conflict of interest: The authors state no conflict of interest.

  6. Research funding: None declared.

  7. Data availability: The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

1. Singer, M, Deutschman, CS, Seymour, CW, Shankar-Hari, M, Annane, D, Bauer, M, et al.. The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA 2016;315:801–10. https://doi.org/10.1001/jama.2016.0287.Search in Google Scholar PubMed PubMed Central

2. Cecconi, M, Evans, L, Levy, M, Rhodes, A. Sepsis and septic shock. Lancet 2018;392:75–87. https://doi.org/10.1016/s0140-6736(18)30696-2.Search in Google Scholar

3. Seymour, CW, Gesten, F, Prescott, HC, Friedrich, ME, Iwashyna, TJ, Phillips, GS, et al.. Time to treatment and mortality during mandated emergency care for sepsis. N Engl J Med 2017;376:2235–44. https://doi.org/10.1056/nejmoa1703058.Search in Google Scholar PubMed PubMed Central

4. Fleischmann, C, Scherag, A, Adhikari, NKJ, Hartog, CS, Tsaganos, T, Schlattmann, P, et al.. Assessment of global incidence and mortality of hospital-treated sepsis. Current estimates and limitations. Am J Respir Crit Care Med 2016;193:259–72. https://doi.org/10.1164/rccm.201504-0781oc.Search in Google Scholar PubMed

5. Lambden, S, Laterre, PF, Levy, MM, Francois, B. The SOFA score-development, utility and challenges of accurate assessment in clinical trials. Crit Care 2019;23:374. https://doi.org/10.1186/s13054-019-2663-7.Search in Google Scholar PubMed PubMed Central

6. Desautels, T, Calvert, J, Hoffman, J, Jay, M, Kerem, Y, Shieh, L, et al.. Prediction of sepsis in the Intensive Care unit with minimal electronic health record data: a machine learning approach. JMIR Med Inf 2016;4:e28. https://doi.org/10.2196/medinform.5909.Search in Google Scholar PubMed PubMed Central

7. Nemati, S, Holder, A, Razmi, F, Stanley, MD, Clifford, GD, Buchman, TG. An interpretable machine learning model for accurate prediction of sepsis in the ICU. Crit Care Med 2018;46:547–53. https://doi.org/10.1097/ccm.0000000000002936.Search in Google Scholar PubMed PubMed Central

8. Lauritsen, SM, Kristensen, M, Olsen, MV, Larsen, MS, Lauritsen, KM, Jørgensen, MJ, et al.. Explainable artificial intelligence model to predict acute critical illness from electronic health records. Nat Commun 2020;11:3852. https://doi.org/10.1038/s41467-020-17431-x.Search in Google Scholar PubMed PubMed Central

9. Fleuren, LM, Klausch, TLT, Zwager, CL, Schoonmade, LJ, Guo, T, Roggeveen, LF, et al.. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med 2020;46:383–400. https://doi.org/10.1007/s00134-019-05872-y.Search in Google Scholar PubMed PubMed Central

10. Kwong, JCC, Nickel, GC, Wang, SCY, Kvedar, JC. Integrating artificial intelligence into healthcare systems: more than just the algorithm. NPJ Digit Med 2024;7:52. https://doi.org/10.1038/s41746-024-01066-z.Search in Google Scholar PubMed PubMed Central

11. Shashikumar, SP, Wardi, G, Malhotra, A, Nemati, S. Artificial intelligence sepsis prediction algorithm learns to say “I don’t know”. NPJ Digit Med 2021;4:134. https://doi.org/10.1038/s41746-021-00504-6.Search in Google Scholar PubMed PubMed Central

12. Wardi, G, Carlile, M, Holder, A, Shashikumar, S, Hayden, SR, Nemati, S. Predicting progression to septic shock in the emergency department using an externally generalizable machine-learning algorithm. Ann Emerg Med 2021;77:395–406. https://doi.org/10.1016/j.annemergmed.2020.11.007.Search in Google Scholar PubMed PubMed Central

13. Steinbach, D, Ahrens, PC, Schmidt, M, Federbusch, M, Heuft, L, Lübbert, C, et al.. Applying machine learning to blood count data predicts sepsis with ICU admission. Clin Chem 2024;70:506–15. https://doi.org/10.1093/clinchem/hvae001.Search in Google Scholar PubMed

14. Mao, Q, Jay, M, Hoffman, JL, Calvert, J, Barton, C, Shimabukuro, D, et al.. Multicentre validation of a sepsis prediction algorithm using only vital sign data in the emergency department, general ward and ICU. BMJ Open 2018;8:e017833. https://doi.org/10.1136/bmjopen-2017-017833.Search in Google Scholar PubMed PubMed Central

15. Campagner, A, Agnello, L, Carobene, A, Padoan, A, Del Ben, F, Locatelli, M, et al.. Complete blood count and monocyte distribution width-based machine learning algorithms for sepsis detection: multicentric development and external validation study. J Med Internet Res 2025;27:e55492. https://doi.org/10.2196/55492.Search in Google Scholar PubMed PubMed Central

16. Adams, R, Ji, Y, Wang, X, Saria, S. Learning models from data with measurement error: tackling underreporting. arXiv [stat.ML] 2019. Available from: http://arxiv.org/abs/1901.09060.Search in Google Scholar

17. Coskun, A. Bias in laboratory medicine: the dark side of the moon. Ann Lab Med 2024;44:6–20. https://doi.org/10.3343/alm.2024.44.1.6.Search in Google Scholar PubMed PubMed Central

18. Sandberg, S, Coskun, A, Carobene, A, Fernandez-Calle, P, Diaz-Garzon, J, Bartlett, WA, et al.. Analytical performance specifications based on biological variation data - considerations, strengths and limitations. Clin Chem Lab Med 2024;62:1483–9. https://doi.org/10.1515/cclm-2024-0108.Search in Google Scholar PubMed

19. Vokinger, KN, Feuerriegel, S, Kesselheim, AS. Mitigating bias in machine learning for medicine. Commun Med (Lond) 2021;1:25. https://doi.org/10.1038/s43856-021-00028-w.Search in Google Scholar PubMed PubMed Central

20. Yang, HS, Rhoads, DD, Sepulveda, J, Zang, C, Chadburn, A, Wang, F. Building the model. Arch Pathol Lab Med 2023;147:826–36. https://doi.org/10.5858/arpa.2021-0635-ra.Search in Google Scholar

21. Luu, HS. Laboratory data as a potential source of bias in healthcare artificial intelligence and machine learning models. Ann Lab Med 2025;45:12–21. https://doi.org/10.3343/alm.2024.0323.Search in Google Scholar PubMed PubMed Central

22. Chen, X, Hu, L, Yu, R. Development and external validation of machine learning-based models to predict patients with cellulitis developing sepsis during hospitalisation. BMJ Open 2024;14:e084183. https://doi.org/10.1136/bmjopen-2024-084183.Search in Google Scholar PubMed PubMed Central

23. Sun, W, Nasraoui, O, Shafto, P. Evolution and impact of bias in human and machine learning algorithm interaction. PLoS One 2020;15:e0235502. https://doi.org/10.1371/journal.pone.0235502.Search in Google Scholar PubMed PubMed Central

24. Li, F, Wu, P, Ong, HH, Peterson, JF, Wei, W-Q, Zhao, J. Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction. J Biomed Inform 2023;138:104294. https://doi.org/10.1016/j.jbi.2023.104294.Search in Google Scholar PubMed PubMed Central

25. Kursa, MB, Jankowski, A, Rudnicki, WR. Boruta – a system for feature selection. Fundam Inform 2010;101:271–85. https://doi.org/10.3233/fi-2010-288.Search in Google Scholar

26. Master, SR, Badrick, TC, Bietenbeck, A, Haymond, S. Machine learning in laboratory medicine: recommendations of the IFCC working group. Clin Chem 2023;69:690–8. https://doi.org/10.1093/clinchem/hvad055.Search in Google Scholar PubMed PubMed Central

27. Aarsand, AK, Fernandez-Calle, P, Webster, C, Coskun, A, Gonzales-Lao, E, Diaz-Garzon, J, et al.. EFLM biological variation. [Online]. https://biologicalvariation.eu/ [Accessed 11 Apr 2025].Search in Google Scholar

28. Penev, MN, Doukova-Peneva, P, Kalinov, K. Study on long-term biological variability of erythrocyte sedimentation rate. Scand J Clin Lab Invest 1996;56:285–8. https://doi.org/10.3109/00365519609088618.Search in Google Scholar PubMed

29. Harris, EK, Yasaka, T. On the calculation of a “reference change” for comparing two consecutive measurements. Clin Chem 1983;29:25–30. https://doi.org/10.1093/clinchem/29.1.25.Search in Google Scholar

30. DeLong, ER, DeLong, DM, Clarke-Pearson, DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837–45. https://doi.org/10.2307/2531595.Search in Google Scholar

31. Şener, B, Acici, K, Sümer, E. Categorization of Alzheimer’s disease stages using deep learning approaches with McNemar’s test. PeerJ Comput Sci 2024;10:e1877. https://doi.org/10.7717/peerj-cs.1877.Search in Google Scholar PubMed PubMed Central

32. Agnello, L, Vidali, M, Padoan, A, Lucis, R, Mancini, A, Guerranti, R, et al.. Machine learning algorithms in sepsis. Clin Chim Acta 2024;553:117738. https://doi.org/10.1016/j.cca.2023.117738.Search in Google Scholar PubMed

33. Padoan, A, Talli, I, Pelloso, M, Galla, L, Tosato, F, Diamanti, D, et al.. A machine learning approach for assessing acute infection by erythrocyte sedimentation rate (ESR) kinetics. Clin Chim Acta 2025;574:120308. https://doi.org/10.1016/j.cca.2025.120308.Search in Google Scholar PubMed

34. Cabitza, F, Campagner, A, Soares, F, García de Guadiana-Romualdo, L, Challa, F, Sulejmani, A, et al.. The importance of being external. methodological insights for the external validation of machine learning models in medicine. Comput Methods Progr Biomed 2021;208:106288. https://doi.org/10.1016/j.cmpb.2021.106288.Search in Google Scholar PubMed

35. Cross, JL, Choma, MA, Onofrey, JA. Bias in medical AI: implications for clinical decision-making. PLOS Digit Health 2024;3:e0000651. https://doi.org/10.1371/journal.pdig.0000651.Search in Google Scholar PubMed PubMed Central

36. Shah, M, Sureja, N. A comprehensive review of bias in deep learning models: Methods, impacts, and future directions. Arch Comput Methods Eng 2025;32:255–67. https://doi.org/10.1007/s11831-024-10134-2.Search in Google Scholar

37. Cabitza, F, Campagner, A. The need to separate the wheat from the chaff in medical informatics: introducing a comprehensive checklist for the (self)-assessment of medical AI studies. Int J Med Inf 2021;153:104510. https://doi.org/10.1016/j.ijmedinf.2021.104510.Search in Google Scholar PubMed

38. Sandberg, S, Fraser, CG, Horvath, AR, Jansen, R, Jones, G, Oosterhuis, W, et al.. Defining analytical performance specifications: consensus statement from the 1st strategic conference of the European federation of clinical Chemistry and laboratory medicine. Clin Chem Lab Med 2015;53:833–5. https://doi.org/10.1515/cclm-2015-0067.Search in Google Scholar PubMed


Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/cclm-2025-0491).


Received: 2025-04-23
Accepted: 2025-05-08
Published Online: 2025-05-28
Published in Print: 2025-09-25

© 2025 Walter de Gruyter GmbH, Berlin/Boston

Articles in the same Issue

  1. Frontmatter
  2. Editorial
  3. Quality indicators: an evolving target for laboratory medicine
  4. Reviews
  5. Regulating the future of laboratory medicine: European regulatory landscape of AI-driven medical device software in laboratory medicine
  6. The spectrum of nuclear patterns with stained metaphase chromosome plate: morphology nuances, immunological associations, and clinical relevance
  7. Opinion Papers
  8. Comprehensive assessment of medical laboratory performance: a 4D model of quality, economics, velocity, and productivity indicators
  9. Detecting cardiac injury: the next generation of high-sensitivity cardiac troponins improving diagnostic outcomes
  10. Perspectives
  11. Can Theranos resurrect from its ashes?
  12. Guidelines and Recommendations
  13. Australasian guideline for the performance of sweat chloride testing 3rd edition: to support cystic fibrosis screening, diagnosis and monitoring
  14. General Clinical Chemistry and Laboratory Medicine
  15. Recommendations for the integration of standardized quality indicators for glucose point-of-care testing
  16. A cost-effective assessment for the combination of indirect immunofluorescence and solid-phase assay in ANA-screening
  17. Assessment of measurement uncertainty of immunoassays and LC-MS/MS methods for serum 25-hydroxyvitamin D
  18. A novel immunoprecipitation-based targeted liquid chromatography-tandem mass spectrometry analysis for accurate determination for copeptin in human serum
  19. Histamine metabolite to basal serum tryptase ratios in systemic mastocytosis and hereditary alpha tryptasemia using a validated LC-MS/MS approach
  20. Machine learning algorithms with body fluid parameters: an interpretable framework for malignant cell screening in cerebrospinal fluid
  21. Impact of analytical bias on machine learning models for sepsis prediction using laboratory data
  22. Immunochemical measurement of urinary free light chains and Bence Jones proteinuria
  23. Serum biomarkers as early indicators of outcomes in spontaneous subarachnoid hemorrhage
  24. High myoglobin plasma samples risk being reported as falsely low due to antigen excess – follow up after a 2-year period of using a mitigating procedure
  25. Candidate Reference Measurement Procedures and Materials
  26. Commutability evaluation of glycated albumin candidate EQA materials
  27. Reference Values and Biological Variations
  28. Health-related reference intervals for heavy metals in non-exposed young adults
  29. Hematology and Coagulation
  30. Practical handling of hemolytic, icteric and lipemic samples for coagulation testing in European laboratories. A collaborative survey from the European Organisation for External Quality Assurance Providers in Laboratory Medicine (EQALM)
  31. Cancer Diagnostics
  32. Assessment of atypical cells in detecting bladder cancer in female patients
  33. Cardiovascular Diseases
  34. False-positive cardiac troponin I values due to macrotroponin in healthy athletes after COVID-19
  35. Diabetes
  36. A comparison of current methods to measure antibodies in type 1 diabetes
  37. Letters to the Editor
  38. The neglected issue of pyridoxal- 5′ phosphate
  39. Error in prostate-specific antigen levels after prostate cancer treatment with radical prostatectomy
  40. Arivale is dead ‒ Hooke is alive
  41. A single dose of 20-mg of ostarine is detectable in hair
  42. Growing importance of vocabularies in medical laboratories
  43. Congress Abstracts
  44. 62nd National Congress of the Hungarian Society of Laboratory Medicine Szeged, Hungary, August 28–30, 2025
Downloaded on 25.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/cclm-2025-0491/html
Scroll to top button