Impact of analytical bias on machine learning models for sepsis prediction using laboratory data
-
Meryem Rumeysa Yesil
, Ilaria Talli
, Michela Pelloso
, Chiara Cosma
, Elisa Pangrazzi
, Mario Plebani
, Yasemin Ustundag
and Andrea Padoan
Abstract
Objectives
Machine learning (ML) models, using laboratory data, support early sepsis prediction. However, analytical bias in laboratory measurements can compromise their performance and validity in real-world settings. We aimed to evaluate how analytically acceptable bias may affect the validity and generalizability of ML models trained on laboratory data.
Methods
A support vector machine model (SVM) for sepsis prediction was developed using complete blood count and erythrocyte sedimentation rate data from outpatients (CS, n=104) and patients from acute inflammatory status wards (SS, n=107). Twenty-six combinations were derived by white blood cells (WBC), platelets (PLT), and erythrocyte sedimentation rate (ESR) biases from analytical performance specifications (APS). The diagnostic performances of the 26 conditions tested were compared to the original dataset.
Results
SVM performance of the original dataset was AUC 90.6 % [95 %CI: 80.6–98.7 %]. Minimum, desirable and optimum acceptable biases for WBC were 7.7 , 5.1 and 2.6 %, respectively, for PLT were 6.7 , 4.5 and 2.2 %, respectively and for ESR were 31.6 , 21.1 and 10.5 %, respectively. Across all conditions, AUC varied from 89.8 % [95 %CI: 79.0–97.7 %] (for PLT bias −6.7 %), to 89.5 % [95 %CI: 79.1–98.0 %] (for ESR Bias +31.6 %) to 90.4 % [95 %CI: 79.3–98.4 %] (for WBC Bias −5.1 %). Using a combination of biases, the lowest AUC was 87.8 % [95 %CI: 75.9–96.6 %]. No statistically significant differences were observed for AUC (p>0.05).
Conclusions
Bias can influence model performance depending on the parameters and their combinations. Developing new validation strategies to assess the impact of analytical bias on laboratory data in ML models could improve their reliability.
Acknowledgments
Dr. Meryem Rumeysa Yeşil gratefully acknowledges the valuable support provided by the IFCC Professional Scientific Exchange Programme (PSEP).
-
Research ethics: The study protocol was conducted in accordance with the principles of the Declaration of Helsinki. In this study, there was no direct intervention in the patient, and data were completely anonymous. All data handling was done in a way where the identity of the patient is unknown, even to the person performing the analysis, thus ensuring the patients’ security and privacy.
-
Informed consent: The need for informed consent was waived due to the study design being retrospective and utilizing only residual samples from the routine at University-Hospital of Padova, Italy (AOPD).
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Use of Large Language Models, AI and Machine Learning Tools: None declared.
-
Conflict of interest: The authors state no conflict of interest.
-
Research funding: None declared.
-
Data availability: The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.
References
1. Singer, M, Deutschman, CS, Seymour, CW, Shankar-Hari, M, Annane, D, Bauer, M, et al.. The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA 2016;315:801–10. https://doi.org/10.1001/jama.2016.0287.Search in Google Scholar PubMed PubMed Central
2. Cecconi, M, Evans, L, Levy, M, Rhodes, A. Sepsis and septic shock. Lancet 2018;392:75–87. https://doi.org/10.1016/s0140-6736(18)30696-2.Search in Google Scholar
3. Seymour, CW, Gesten, F, Prescott, HC, Friedrich, ME, Iwashyna, TJ, Phillips, GS, et al.. Time to treatment and mortality during mandated emergency care for sepsis. N Engl J Med 2017;376:2235–44. https://doi.org/10.1056/nejmoa1703058.Search in Google Scholar PubMed PubMed Central
4. Fleischmann, C, Scherag, A, Adhikari, NKJ, Hartog, CS, Tsaganos, T, Schlattmann, P, et al.. Assessment of global incidence and mortality of hospital-treated sepsis. Current estimates and limitations. Am J Respir Crit Care Med 2016;193:259–72. https://doi.org/10.1164/rccm.201504-0781oc.Search in Google Scholar PubMed
5. Lambden, S, Laterre, PF, Levy, MM, Francois, B. The SOFA score-development, utility and challenges of accurate assessment in clinical trials. Crit Care 2019;23:374. https://doi.org/10.1186/s13054-019-2663-7.Search in Google Scholar PubMed PubMed Central
6. Desautels, T, Calvert, J, Hoffman, J, Jay, M, Kerem, Y, Shieh, L, et al.. Prediction of sepsis in the Intensive Care unit with minimal electronic health record data: a machine learning approach. JMIR Med Inf 2016;4:e28. https://doi.org/10.2196/medinform.5909.Search in Google Scholar PubMed PubMed Central
7. Nemati, S, Holder, A, Razmi, F, Stanley, MD, Clifford, GD, Buchman, TG. An interpretable machine learning model for accurate prediction of sepsis in the ICU. Crit Care Med 2018;46:547–53. https://doi.org/10.1097/ccm.0000000000002936.Search in Google Scholar PubMed PubMed Central
8. Lauritsen, SM, Kristensen, M, Olsen, MV, Larsen, MS, Lauritsen, KM, Jørgensen, MJ, et al.. Explainable artificial intelligence model to predict acute critical illness from electronic health records. Nat Commun 2020;11:3852. https://doi.org/10.1038/s41467-020-17431-x.Search in Google Scholar PubMed PubMed Central
9. Fleuren, LM, Klausch, TLT, Zwager, CL, Schoonmade, LJ, Guo, T, Roggeveen, LF, et al.. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med 2020;46:383–400. https://doi.org/10.1007/s00134-019-05872-y.Search in Google Scholar PubMed PubMed Central
10. Kwong, JCC, Nickel, GC, Wang, SCY, Kvedar, JC. Integrating artificial intelligence into healthcare systems: more than just the algorithm. NPJ Digit Med 2024;7:52. https://doi.org/10.1038/s41746-024-01066-z.Search in Google Scholar PubMed PubMed Central
11. Shashikumar, SP, Wardi, G, Malhotra, A, Nemati, S. Artificial intelligence sepsis prediction algorithm learns to say “I don’t know”. NPJ Digit Med 2021;4:134. https://doi.org/10.1038/s41746-021-00504-6.Search in Google Scholar PubMed PubMed Central
12. Wardi, G, Carlile, M, Holder, A, Shashikumar, S, Hayden, SR, Nemati, S. Predicting progression to septic shock in the emergency department using an externally generalizable machine-learning algorithm. Ann Emerg Med 2021;77:395–406. https://doi.org/10.1016/j.annemergmed.2020.11.007.Search in Google Scholar PubMed PubMed Central
13. Steinbach, D, Ahrens, PC, Schmidt, M, Federbusch, M, Heuft, L, Lübbert, C, et al.. Applying machine learning to blood count data predicts sepsis with ICU admission. Clin Chem 2024;70:506–15. https://doi.org/10.1093/clinchem/hvae001.Search in Google Scholar PubMed
14. Mao, Q, Jay, M, Hoffman, JL, Calvert, J, Barton, C, Shimabukuro, D, et al.. Multicentre validation of a sepsis prediction algorithm using only vital sign data in the emergency department, general ward and ICU. BMJ Open 2018;8:e017833. https://doi.org/10.1136/bmjopen-2017-017833.Search in Google Scholar PubMed PubMed Central
15. Campagner, A, Agnello, L, Carobene, A, Padoan, A, Del Ben, F, Locatelli, M, et al.. Complete blood count and monocyte distribution width-based machine learning algorithms for sepsis detection: multicentric development and external validation study. J Med Internet Res 2025;27:e55492. https://doi.org/10.2196/55492.Search in Google Scholar PubMed PubMed Central
16. Adams, R, Ji, Y, Wang, X, Saria, S. Learning models from data with measurement error: tackling underreporting. arXiv [stat.ML] 2019. Available from: http://arxiv.org/abs/1901.09060.Search in Google Scholar
17. Coskun, A. Bias in laboratory medicine: the dark side of the moon. Ann Lab Med 2024;44:6–20. https://doi.org/10.3343/alm.2024.44.1.6.Search in Google Scholar PubMed PubMed Central
18. Sandberg, S, Coskun, A, Carobene, A, Fernandez-Calle, P, Diaz-Garzon, J, Bartlett, WA, et al.. Analytical performance specifications based on biological variation data - considerations, strengths and limitations. Clin Chem Lab Med 2024;62:1483–9. https://doi.org/10.1515/cclm-2024-0108.Search in Google Scholar PubMed
19. Vokinger, KN, Feuerriegel, S, Kesselheim, AS. Mitigating bias in machine learning for medicine. Commun Med (Lond) 2021;1:25. https://doi.org/10.1038/s43856-021-00028-w.Search in Google Scholar PubMed PubMed Central
20. Yang, HS, Rhoads, DD, Sepulveda, J, Zang, C, Chadburn, A, Wang, F. Building the model. Arch Pathol Lab Med 2023;147:826–36. https://doi.org/10.5858/arpa.2021-0635-ra.Search in Google Scholar
21. Luu, HS. Laboratory data as a potential source of bias in healthcare artificial intelligence and machine learning models. Ann Lab Med 2025;45:12–21. https://doi.org/10.3343/alm.2024.0323.Search in Google Scholar PubMed PubMed Central
22. Chen, X, Hu, L, Yu, R. Development and external validation of machine learning-based models to predict patients with cellulitis developing sepsis during hospitalisation. BMJ Open 2024;14:e084183. https://doi.org/10.1136/bmjopen-2024-084183.Search in Google Scholar PubMed PubMed Central
23. Sun, W, Nasraoui, O, Shafto, P. Evolution and impact of bias in human and machine learning algorithm interaction. PLoS One 2020;15:e0235502. https://doi.org/10.1371/journal.pone.0235502.Search in Google Scholar PubMed PubMed Central
24. Li, F, Wu, P, Ong, HH, Peterson, JF, Wei, W-Q, Zhao, J. Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction. J Biomed Inform 2023;138:104294. https://doi.org/10.1016/j.jbi.2023.104294.Search in Google Scholar PubMed PubMed Central
25. Kursa, MB, Jankowski, A, Rudnicki, WR. Boruta – a system for feature selection. Fundam Inform 2010;101:271–85. https://doi.org/10.3233/fi-2010-288.Search in Google Scholar
26. Master, SR, Badrick, TC, Bietenbeck, A, Haymond, S. Machine learning in laboratory medicine: recommendations of the IFCC working group. Clin Chem 2023;69:690–8. https://doi.org/10.1093/clinchem/hvad055.Search in Google Scholar PubMed PubMed Central
27. Aarsand, AK, Fernandez-Calle, P, Webster, C, Coskun, A, Gonzales-Lao, E, Diaz-Garzon, J, et al.. EFLM biological variation. [Online]. https://biologicalvariation.eu/ [Accessed 11 Apr 2025].Search in Google Scholar
28. Penev, MN, Doukova-Peneva, P, Kalinov, K. Study on long-term biological variability of erythrocyte sedimentation rate. Scand J Clin Lab Invest 1996;56:285–8. https://doi.org/10.3109/00365519609088618.Search in Google Scholar PubMed
29. Harris, EK, Yasaka, T. On the calculation of a “reference change” for comparing two consecutive measurements. Clin Chem 1983;29:25–30. https://doi.org/10.1093/clinchem/29.1.25.Search in Google Scholar
30. DeLong, ER, DeLong, DM, Clarke-Pearson, DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837–45. https://doi.org/10.2307/2531595.Search in Google Scholar
31. Şener, B, Acici, K, Sümer, E. Categorization of Alzheimer’s disease stages using deep learning approaches with McNemar’s test. PeerJ Comput Sci 2024;10:e1877. https://doi.org/10.7717/peerj-cs.1877.Search in Google Scholar PubMed PubMed Central
32. Agnello, L, Vidali, M, Padoan, A, Lucis, R, Mancini, A, Guerranti, R, et al.. Machine learning algorithms in sepsis. Clin Chim Acta 2024;553:117738. https://doi.org/10.1016/j.cca.2023.117738.Search in Google Scholar PubMed
33. Padoan, A, Talli, I, Pelloso, M, Galla, L, Tosato, F, Diamanti, D, et al.. A machine learning approach for assessing acute infection by erythrocyte sedimentation rate (ESR) kinetics. Clin Chim Acta 2025;574:120308. https://doi.org/10.1016/j.cca.2025.120308.Search in Google Scholar PubMed
34. Cabitza, F, Campagner, A, Soares, F, García de Guadiana-Romualdo, L, Challa, F, Sulejmani, A, et al.. The importance of being external. methodological insights for the external validation of machine learning models in medicine. Comput Methods Progr Biomed 2021;208:106288. https://doi.org/10.1016/j.cmpb.2021.106288.Search in Google Scholar PubMed
35. Cross, JL, Choma, MA, Onofrey, JA. Bias in medical AI: implications for clinical decision-making. PLOS Digit Health 2024;3:e0000651. https://doi.org/10.1371/journal.pdig.0000651.Search in Google Scholar PubMed PubMed Central
36. Shah, M, Sureja, N. A comprehensive review of bias in deep learning models: Methods, impacts, and future directions. Arch Comput Methods Eng 2025;32:255–67. https://doi.org/10.1007/s11831-024-10134-2.Search in Google Scholar
37. Cabitza, F, Campagner, A. The need to separate the wheat from the chaff in medical informatics: introducing a comprehensive checklist for the (self)-assessment of medical AI studies. Int J Med Inf 2021;153:104510. https://doi.org/10.1016/j.ijmedinf.2021.104510.Search in Google Scholar PubMed
38. Sandberg, S, Fraser, CG, Horvath, AR, Jansen, R, Jones, G, Oosterhuis, W, et al.. Defining analytical performance specifications: consensus statement from the 1st strategic conference of the European federation of clinical Chemistry and laboratory medicine. Clin Chem Lab Med 2015;53:833–5. https://doi.org/10.1515/cclm-2015-0067.Search in Google Scholar PubMed
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/cclm-2025-0491).
© 2025 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Editorial
- Quality indicators: an evolving target for laboratory medicine
- Reviews
- Regulating the future of laboratory medicine: European regulatory landscape of AI-driven medical device software in laboratory medicine
- The spectrum of nuclear patterns with stained metaphase chromosome plate: morphology nuances, immunological associations, and clinical relevance
- Opinion Papers
- Comprehensive assessment of medical laboratory performance: a 4D model of quality, economics, velocity, and productivity indicators
- Detecting cardiac injury: the next generation of high-sensitivity cardiac troponins improving diagnostic outcomes
- Perspectives
- Can Theranos resurrect from its ashes?
- Guidelines and Recommendations
- Australasian guideline for the performance of sweat chloride testing 3rd edition: to support cystic fibrosis screening, diagnosis and monitoring
- General Clinical Chemistry and Laboratory Medicine
- Recommendations for the integration of standardized quality indicators for glucose point-of-care testing
- A cost-effective assessment for the combination of indirect immunofluorescence and solid-phase assay in ANA-screening
- Assessment of measurement uncertainty of immunoassays and LC-MS/MS methods for serum 25-hydroxyvitamin D
- A novel immunoprecipitation-based targeted liquid chromatography-tandem mass spectrometry analysis for accurate determination for copeptin in human serum
- Histamine metabolite to basal serum tryptase ratios in systemic mastocytosis and hereditary alpha tryptasemia using a validated LC-MS/MS approach
- Machine learning algorithms with body fluid parameters: an interpretable framework for malignant cell screening in cerebrospinal fluid
- Impact of analytical bias on machine learning models for sepsis prediction using laboratory data
- Immunochemical measurement of urinary free light chains and Bence Jones proteinuria
- Serum biomarkers as early indicators of outcomes in spontaneous subarachnoid hemorrhage
- High myoglobin plasma samples risk being reported as falsely low due to antigen excess – follow up after a 2-year period of using a mitigating procedure
- Candidate Reference Measurement Procedures and Materials
- Commutability evaluation of glycated albumin candidate EQA materials
- Reference Values and Biological Variations
- Health-related reference intervals for heavy metals in non-exposed young adults
- Hematology and Coagulation
- Practical handling of hemolytic, icteric and lipemic samples for coagulation testing in European laboratories. A collaborative survey from the European Organisation for External Quality Assurance Providers in Laboratory Medicine (EQALM)
- Cancer Diagnostics
- Assessment of atypical cells in detecting bladder cancer in female patients
- Cardiovascular Diseases
- False-positive cardiac troponin I values due to macrotroponin in healthy athletes after COVID-19
- Diabetes
- A comparison of current methods to measure antibodies in type 1 diabetes
- Letters to the Editor
- The neglected issue of pyridoxal- 5′ phosphate
- Error in prostate-specific antigen levels after prostate cancer treatment with radical prostatectomy
- Arivale is dead ‒ Hooke is alive
- A single dose of 20-mg of ostarine is detectable in hair
- Growing importance of vocabularies in medical laboratories
- Congress Abstracts
- 62nd National Congress of the Hungarian Society of Laboratory Medicine Szeged, Hungary, August 28–30, 2025
Articles in the same Issue
- Frontmatter
- Editorial
- Quality indicators: an evolving target for laboratory medicine
- Reviews
- Regulating the future of laboratory medicine: European regulatory landscape of AI-driven medical device software in laboratory medicine
- The spectrum of nuclear patterns with stained metaphase chromosome plate: morphology nuances, immunological associations, and clinical relevance
- Opinion Papers
- Comprehensive assessment of medical laboratory performance: a 4D model of quality, economics, velocity, and productivity indicators
- Detecting cardiac injury: the next generation of high-sensitivity cardiac troponins improving diagnostic outcomes
- Perspectives
- Can Theranos resurrect from its ashes?
- Guidelines and Recommendations
- Australasian guideline for the performance of sweat chloride testing 3rd edition: to support cystic fibrosis screening, diagnosis and monitoring
- General Clinical Chemistry and Laboratory Medicine
- Recommendations for the integration of standardized quality indicators for glucose point-of-care testing
- A cost-effective assessment for the combination of indirect immunofluorescence and solid-phase assay in ANA-screening
- Assessment of measurement uncertainty of immunoassays and LC-MS/MS methods for serum 25-hydroxyvitamin D
- A novel immunoprecipitation-based targeted liquid chromatography-tandem mass spectrometry analysis for accurate determination for copeptin in human serum
- Histamine metabolite to basal serum tryptase ratios in systemic mastocytosis and hereditary alpha tryptasemia using a validated LC-MS/MS approach
- Machine learning algorithms with body fluid parameters: an interpretable framework for malignant cell screening in cerebrospinal fluid
- Impact of analytical bias on machine learning models for sepsis prediction using laboratory data
- Immunochemical measurement of urinary free light chains and Bence Jones proteinuria
- Serum biomarkers as early indicators of outcomes in spontaneous subarachnoid hemorrhage
- High myoglobin plasma samples risk being reported as falsely low due to antigen excess – follow up after a 2-year period of using a mitigating procedure
- Candidate Reference Measurement Procedures and Materials
- Commutability evaluation of glycated albumin candidate EQA materials
- Reference Values and Biological Variations
- Health-related reference intervals for heavy metals in non-exposed young adults
- Hematology and Coagulation
- Practical handling of hemolytic, icteric and lipemic samples for coagulation testing in European laboratories. A collaborative survey from the European Organisation for External Quality Assurance Providers in Laboratory Medicine (EQALM)
- Cancer Diagnostics
- Assessment of atypical cells in detecting bladder cancer in female patients
- Cardiovascular Diseases
- False-positive cardiac troponin I values due to macrotroponin in healthy athletes after COVID-19
- Diabetes
- A comparison of current methods to measure antibodies in type 1 diabetes
- Letters to the Editor
- The neglected issue of pyridoxal- 5′ phosphate
- Error in prostate-specific antigen levels after prostate cancer treatment with radical prostatectomy
- Arivale is dead ‒ Hooke is alive
- A single dose of 20-mg of ostarine is detectable in hair
- Growing importance of vocabularies in medical laboratories
- Congress Abstracts
- 62nd National Congress of the Hungarian Society of Laboratory Medicine Szeged, Hungary, August 28–30, 2025