Development and validation of a machine learning model for accurate detection of wrong blood in tube errors in hospitalized patients

Jordi Tortosa-Carreres; Ana Vañó-Bellver; Andreu Martínez-Cerezuela; Óscar Fuster-Lluch; Lucía García-Ruiz; Elena Rodríguez-Romero; Antonio Sierra-Rivera; Ana Comes-Raga; Carlos Cátedra; Virginia Tadeo-Garisto; Alexandra Igumnova-Dubovikova; Rafael Gisbert-Criado; Laura Sahuquillo-Frias; Begoña Laiz-Marro

doi:10.1515/cclm-2025-0564

Article

Development and validation of a machine learning model for accurate detection of wrong blood in tube errors in hospitalized patients

Jordi Tortosa-Carreres , Ana Vañó-Bellver , Andreu Martínez-Cerezuela , Óscar Fuster-Lluch , Lucía García-Ruiz , Elena Rodríguez-Romero , Antonio Sierra-Rivera , Ana Comes-Raga , Carlos Cátedra , Virginia Tadeo-Garisto , Alexandra Igumnova-Dubovikova , Rafael Gisbert-Criado , Laura Sahuquillo-Frias and Begoña Laiz-Marro

Published/Copyright: October 29, 2025

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information

From the journal Clinical Chemistry and Laboratory Medicine (CCLM)

Abstract

Objectives

To develop and validate a machine-learning model based on routinely available biochemical and hematological parameters for detecting wrong blood in tube (WBIT) errors in hospitalized patients.

Methods

A retrospective multicenter study including one internal cohort (IC) and two external validation cohorts (EVC, EVC2). The IC was balanced (50 % correct, 50 % WBIT; 25 % real, 25 % simulated), while EVC (n=800) and EVC2 (n=460) represented more realistic scenarios (95 % correct, 5 % WBIT; equally distributed between real and simulated). Parameters present in ≥ 95 % of requests were selected, and their normalized variation from the immediately preceding result was calculated. The IC was divided into a training set (IC-TS, n=324) and an internal validation set (IC-VS, n=108). Feature selection was refined with Elastic Net before training an XGBoost model. Performance was assessed in IC-VS, EVC, and EVC2. For benchmarking, the model’s discriminative ability was also compared with a multivariate Mahalanobis-based approach and with univariate delta checks within IC-TS/IC-VS.

Results

Sixteen of 25 candidate variables were retained. The model achieved ROC-AUC values of 0.98–0.99 and PR-AUC values of 0.93–0.99 across all validation cohorts. Recalibration improved positive predictive value and net benefit by reducing false positives, with a slight decrease in sensitivity, although all values remained ≥90 %. Specificities ranged from 98 to 99 %. The model consistently outperformed both the multivariate Mahalanobis approach and univariate delta checks within the internal cohort.

Conclusions

This machine-learning model, leveraging widely available routine laboratory parameters, shows strong potential for integration into clinical workflows, enhancing WBIT detection and improving patient safety.

Keywords: preanalytical errors; wrong blood in tube; machine learning; delta check

Corresponding authors: Jordi Tortosa-Carreres and Andreu Martínez-Cerezuela, Laboratory Department, Hospital Univeritari i Politècnic la Fe, Av. Fernando Abril Martorell, 106, 46026, València, Spain, E-mail: jorditc95@outlook.com (J. Tortosa-Carreres), andreumartinezcerezuela@outlook.com (A. Martínez-Cerezuela)

Acknowledgments

We sincerely thank Jesús Álvarez-Sáez for his invaluable technical support in the development of the CDS system and his assistance in building the Python library, as well as for his constant availability and commitment throughout the process. We also gratefully acknowledge Laura Martínez-Racaj for her continued technical support and dedication in the implementation of the CDS infrastructure.

Research ethics: The study was approved by the Institutional Ethics Committee (reference number 2024-1073-1) and conducted in accordance with the principles of the Declaration of Helsinki, ensuring data confidentiality. All data used in this study were pseudonymised prior to analysis by assigning a unique patient identifier. The original identifying information remained securely stored on hospital computers protected by password access and was not transferred or accessible during the analysis process. The study was conducted in accordance with the General Data Protection Regulation (EU) 2016/679 (GDPR) and the Spanish Organic Law 3/2018 on the Protection of Personal Data and Guarantee of Digital Rights (LOPDGDD).
Informed consent: This retrospective study was conducted using clinical data originally collected for healthcare purposes. In accordance with applicable regulations and following review by the institutional ethics committee, the study was deemed exempt from the requirement for individual informed consent, as it involved no intervention and posed no risk to the patients.
Author contributions: Jordi Tortosa-Carreres: Conceptualization; Methodology; Software; Data curation; Formal analysis; Investigation; Visualization; Writing – Original Draft; Ana Vañó-Bellver: Investigation; Data curation; Andreu Martínez-Cerezuela: Investigation; Data curation; Óscar Fuster-Lluch: Writing – Review & Editing; Supervision; Lucía García-Ruiz: Investigation; Data curation; Elena Rodríguez-Romero: Investigation; Data curation; Antonio Sierra-Rivera: Investigation; Data curation; Writing – Review & Editing; Ana Comes-Raga: Investigation; Data curation; Writing – Review & Editing; Carlos Cátedra: Investigation; Data curation; Virginia Tadeo-Garisto: Investigation; Data curation; Alexandra Igumnova: Investigation; Data curation; Rafael Gisbert-Criado: Investigation; Data curation; Laura Sahuquillo-Frías: Supervision; Resources; Begoña Laiz-Marro: Supervision; Resources. All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Use of Large Language Models, AI and Machine Learning Tools: We declare the use of ChatGPT (OpenAI) to support English grammar and style revision during the manuscript preparation.
Conflict of interest: The authors state no conflict of interest.
Research funding: None declared.
Data availability: The training dataset, which consists of fully anonymized data, is available in the GitHub repository (link provided in the Discussion section). The remaining data are available from the corresponding author (JTC) upon reasonable request.

References

1. De Rezende, H, Melleiro, MM. Towards safe patient identification practices: the development of a conceptual framework from the findings of a Ph.D. project. Open Nurs J 2022;16. https://doi.org/10.2174/18744346-v16-e2209290.Search in Google Scholar

2. Plebani, M. Errors in clinical laboratories or errors in laboratory medicine? Clin Chem Lab Med 2006;44:750–9. https://doi.org/10.1515/cclm.2006.123.Search in Google Scholar PubMed

3. Tan, RZ, Markus, C, Loh, TP. Impact of Delta check time intervals on error detection capability. Clin Chem Lab Med 2020;58:384–9. https://doi.org/10.1515/cclm-2019-1004.Search in Google Scholar PubMed

4. Randell, EW, Yenice, S. Delta checks in the clinical laboratory. Crit Rev Clin Lab Sci 2019;56:75–97. https://doi.org/10.1080/10408363.2018.1540536.Search in Google Scholar PubMed

5. Loh, TP, Tan, RZ, Sethi, SK, Lim, CY, Markus, C. Delta checks. Adv Clin Chem 2023;115:175–203. https://doi.org/10.1016/bs.acc.2023.03.005.Search in Google Scholar PubMed

6. Stevens, PE, Levin, A. Kidney disease: improving global outcomes chronic kidney disease guideline development work group members. Evaluation and management of chronic kidney disease: synopsis of the kidney disease: improving global outcomes 2012 clinical practice guideline. Ann Intern Med 2013;158:825–30. https://doi.org/10.7326/0003-4819-158-11-201306040-00007.Search in Google Scholar PubMed

7. Peiffer-Smadja, N, Rawson, TM, Ahmad, R, Buchard, A, Georgiou, P, Lescure, FX, et al.. Machine learning for clinical decision support in infectious diseases: a narrative review of current applications. Clin Microbiol Infect 2020;26:584–95. https://doi.org/10.1016/j.cmi.2019.09.009.Search in Google Scholar PubMed

8. Çubukçu, HC, Topcu, Dİ, Yenice, S. Machine learning-based clinical decision support using laboratory data. Clin Chem Lab Med 2024;62:793–823. https://doi.org/10.1515/cclm-2023-1037.Search in Google Scholar PubMed

9. Farrell, CJ. Identifying mislabelled samples: machine learning models exceed human performance. Ann Clin Biochem 2021;58:650–2. https://doi.org/10.1177/00045632211032991.Search in Google Scholar PubMed

10. Seok, HS, Yu, S, Shin, KH, Lee, W, Chun, S, Kim, S, et al.. Machine learning-based sample misidentification error detection in clinical laboratory tests: a retrospective multicenter study. Clin Chem 2024;70:1256–67. https://doi.org/10.1093/clinchem/hvae114.Search in Google Scholar PubMed

11. Seok, HS, Choi, Y, Yu, S, Shin, KH, Kim, S, Shin, H. Machine learning-based delta check method for detecting misidentification errors in tumor marker tests. Clin Chem Lab Med 2024;62:1421–32. https://doi.org/10.1515/cclm-2023-1185.Search in Google Scholar PubMed

12. Zhou, R, Liang, YF, Cheng, HL, Wang, W, Huang, DW, Wang, Z, et al.. A highly accurate delta check method using deep learning for detection of sample mix-up in the clinical laboratory. Clin Chem Lab Med 2022;60:1984–92. https://doi.org/10.1515/cclm-2021-1171.Search in Google Scholar PubMed

13. Farrell, CJ, Makuni, C, Keenan, A, Maeder, E, Davies, G, Giannoutsos, J. A machine learning model for the routine detection of «wrong blood in complete blood count tube» errors. Clin Chem 2023;69:1031–7. https://doi.org/10.1093/clinchem/hvad100.Search in Google Scholar PubMed

14. Farrell, CJL, Giannoutsos, J. Machine learning models outperform manual result review for the identification of wrong blood in tube errors in complete blood count results. Int J Lab Hematol 2022;44:497–503. https://doi.org/10.1111/ijlh.13820.Search in Google Scholar PubMed

15. Perez-Lebel, A, Varoquaux, G, Le Morvan, M, Josse, J, Poline, JB. Benchmarking missing-values approaches for predictive models on health databases. GigaScience 2022;11:giac013. https://doi.org/10.1093/gigascience/giac013.Search in Google Scholar PubMed PubMed Central

16. Chen, T, Guestrin, C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016: 785–94 pp. Available from: http://arxiv.org/abs/1603.02754.10.1145/2939672.2939785Search in Google Scholar

17. Bolton-Maggs, PHB, Wood, EM, Wiersum-Osselton, JC. Wrong blood in tube – potential for serious outcomes: can it be prevented? Br J Haematol 2015;168:3–13. https://doi.org/10.1111/bjh.13137.Search in Google Scholar PubMed

18. Hallworth, MJ. The «70% claim»: what is the evidence base? Ann Clin Biochem 2011;48:487–8. https://doi.org/10.1258/acb.2011.011177.Search in Google Scholar PubMed

19. Lippi, G, Chance, JJ, Church, S, Dazzi, P, Fontana, R, Giavarina, D, et al.. Preanalytical quality improvement: from dream to reality. Clin Chem Lab Med 2011;49:1113–26. https://doi.org/10.1515/cclm.2011.600.Search in Google Scholar PubMed

20. Graham, BV, Master, SR, Obstfeld, AE, Wilson, RB. A multianalyte machine learning model to detect wrong blood in complete blood count tube errors in a pediatric setting. Clin Chem 2025;71:418–27. https://doi.org/10.1093/clinchem/hvae210.Search in Google Scholar PubMed

21. Mitani, T, Doi, S, Yokota, S, Imai, T, Ohe, K. Highly accurate and explainable detection of specimen mix-up using a machine learning model. Clin Chem Lab Med 2020;58:375–83. https://doi.org/10.1515/cclm-2019-0534.Search in Google Scholar PubMed

22. Booth, C, Davies, P. Transfusion sample mislabelling and wrong blood in tube in the UK: insights from the national comparative audits of blood transfusion in 2012 and 2022. Transfus Med 2025;35:41–7. https://doi.org/10.1111/tme.13092.Search in Google Scholar PubMed

23. Strathmann, FG, Baird, GS, Hoffman, NG. Simulations of delta check rule performance to detect specimen mislabeling using historical laboratory data. Clin Chim Acta 2011;412:1973–7. https://doi.org/10.1016/j.cca.2011.07.007.Search in Google Scholar PubMed

24. Castro-Castro, MJ, Sánchez-Navarro, L. Estimation of change limits (deltacheck) in clinical laboratory. Adv Lab Med 2021;2:424–31. https://doi.org/10.1515/almed-2020-0114.Search in Google Scholar PubMed PubMed Central

25. Aarsand, AK, Webster, C, Fernandez-Calle, P, Jonker, N, Diaz-Garzon, J, Coskun, A, et al.. The EFLM biological variation database. Available from: https://biologicalvariation.eu/[cited2025Jul13.Search in Google Scholar

26. Pogorzelska, K, Krętowska, A, Krawczuk-Rybak, M, Sawicka-Żukowska, M. Characteristics of platelet indices and their prognostic significance in selected medical condition – a systematic review. Adv Med Sci 2020;65:310–5. https://doi.org/10.1016/j.advms.2020.05.002.Search in Google Scholar PubMed

27. Kralovcova, M, Müller, J, Hajsmanova, Z, Sigutova, P, Bultasova, L, Palatova, J, et al.. Understanding the value of monocyte distribution width in acutely ill medical patients presenting to the emergency department: a prospective single center evaluation. Sci Rep 2024;14:15255. https://doi.org/10.1038/s41598-024-65883-8.Search in Google Scholar PubMed PubMed Central

28. Passacquale, G, Vamadevan, P, Pereira, L, Hamid, C, Corrigall, V, Ferro, A. Monocyte-platelet interaction induces a pro-inflammatory phenotype in circulating monocytes. PLoS One 2011;6:e25595. https://doi.org/10.1371/journal.pone.0025595.Search in Google Scholar PubMed PubMed Central

29. Kashani, K, Rosner, MH, Ostermann, M. Creatinine: from physiology to clinical application. Eur J Intern Med 2020;72:9–14. https://doi.org/10.1016/j.ejim.2019.10.025.Search in Google Scholar PubMed

30. Ávila, M, Mora Sánchez, MG, Bernal Amador, AS, Paniagua, R. The metabolism of creatinine and its usefulness to evaluate kidney function and body composition in clinical practice. Biomolecules 2025;15:41. https://doi.org/10.3390/biom15010041.Search in Google Scholar PubMed PubMed Central

Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/cclm-2025-0564).

Received: 2025-03-29

Accepted: 2025-10-12

Published Online: 2025-10-29

You are currently not able to access this content.

https://doi.org/10.1515/cclm-2025-0564

Keywords for this article

preanalytical errors; wrong blood in tube; machine learning; delta check