Home Medicine Development and validation of a machine learning model for accurate detection of wrong blood in tube errors in hospitalized patients
Article
Licensed
Unlicensed Requires Authentication

Development and validation of a machine learning model for accurate detection of wrong blood in tube errors in hospitalized patients

  • ORCID logo EMAIL logo , ORCID logo , ORCID logo EMAIL logo , ORCID logo , , , ORCID logo , ORCID logo , , , , ORCID logo , and ORCID logo
Published/Copyright: October 29, 2025

Abstract

Objectives

To develop and validate a machine-learning model based on routinely available biochemical and hematological parameters for detecting wrong blood in tube (WBIT) errors in hospitalized patients.

Methods

A retrospective multicenter study including one internal cohort (IC) and two external validation cohorts (EVC, EVC2). The IC was balanced (50 % correct, 50 % WBIT; 25 % real, 25 % simulated), while EVC (n=800) and EVC2 (n=460) represented more realistic scenarios (95 % correct, 5 % WBIT; equally distributed between real and simulated). Parameters present in ≥ 95 % of requests were selected, and their normalized variation from the immediately preceding result was calculated. The IC was divided into a training set (IC-TS, n=324) and an internal validation set (IC-VS, n=108). Feature selection was refined with Elastic Net before training an XGBoost model. Performance was assessed in IC-VS, EVC, and EVC2. For benchmarking, the model’s discriminative ability was also compared with a multivariate Mahalanobis-based approach and with univariate delta checks within IC-TS/IC-VS.

Results

Sixteen of 25 candidate variables were retained. The model achieved ROC-AUC values of 0.98–0.99 and PR-AUC values of 0.93–0.99 across all validation cohorts. Recalibration improved positive predictive value and net benefit by reducing false positives, with a slight decrease in sensitivity, although all values remained ≥90 %. Specificities ranged from 98 to 99 %. The model consistently outperformed both the multivariate Mahalanobis approach and univariate delta checks within the internal cohort.

Conclusions

This machine-learning model, leveraging widely available routine laboratory parameters, shows strong potential for integration into clinical workflows, enhancing WBIT detection and improving patient safety.


Corresponding authors: Jordi Tortosa-Carreres and Andreu Martínez-Cerezuela, Laboratory Department, Hospital Univeritari i Politècnic la Fe, Av. Fernando Abril Martorell, 106, 46026, València, Spain, E-mail: (J. Tortosa-Carreres), (A. Martínez-Cerezuela)

Acknowledgments

We sincerely thank Jesús Álvarez-Sáez for his invaluable technical support in the development of the CDS system and his assistance in building the Python library, as well as for his constant availability and commitment throughout the process. We also gratefully acknowledge Laura Martínez-Racaj for her continued technical support and dedication in the implementation of the CDS infrastructure.

  1. Research ethics: The study was approved by the Institutional Ethics Committee (reference number 2024-1073-1) and conducted in accordance with the principles of the Declaration of Helsinki, ensuring data confidentiality. All data used in this study were pseudonymised prior to analysis by assigning a unique patient identifier. The original identifying information remained securely stored on hospital computers protected by password access and was not transferred or accessible during the analysis process. The study was conducted in accordance with the General Data Protection Regulation (EU) 2016/679 (GDPR) and the Spanish Organic Law 3/2018 on the Protection of Personal Data and Guarantee of Digital Rights (LOPDGDD).

  2. Informed consent: This retrospective study was conducted using clinical data originally collected for healthcare purposes. In accordance with applicable regulations and following review by the Institutional Ethics Committee, the study was deemed exempt from the requirement for individual informed consent, as it involved no intervention and posed no risk to the patients.

  3. Author contributions: Jordi Tortosa-Carreres: Conceptualization; Methodology; Software; Data curation; Formal analysis; Investigation; Visualization; Writing – Original Draft; Ana Vañó-Bellver: Investigation; Data curation; Andreu Martínez-Cerezuela: Investigation; Data curation; Óscar Fuster-Lluch: Writing – Review & Editing; Supervision; Lucía García-Ruiz: Investigation; Data curation; Elena Rodríguez-Romero: Investigation; Data curation; Antonio Sierra-Rivera: Investigation; Data curation; Writing – Review & Editing; Ana Comes-Raga: Investigation; Data curation; Writing – Review & Editing; Carlos Cátedra: Investigation; Data curation; Virginia Tadeo-Garisto: Investigation; Data curation; Alexandra Igumnova: Investigation; Data curation; Rafael Gisbert-Criado: Investigation; Data curation; Laura Sahuquillo-Frías: Supervision; Resources; Begoña Laiz-Marro: Supervision; Resources. All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  4. Use of Large Language Models, AI and Machine Learning Tools: We declare the use of ChatGPT (OpenAI) to support English grammar and style revision during the manuscript preparation.

  5. Conflict of interest: The authors state no conflict of interest.

  6. Research funding: None declared.

  7. Data availability: The training dataset, which consists of fully anonymized data, is available in the GitHub repository (link provided in the Discussion section). The remaining data are available from the corresponding author (JTC) upon reasonable request.

References

1. De Rezende, H, Melleiro, MM. Towards safe patient identification practices: the development of a conceptual framework from the findings of a Ph.D. project. Open Nurs J 2022;16. https://doi.org/10.2174/18744346-v16-e2209290.Search in Google Scholar

2. Plebani, M. Errors in clinical laboratories or errors in laboratory medicine? Clin Chem Lab Med 2006;44:750–9. https://doi.org/10.1515/cclm.2006.123.Search in Google Scholar PubMed

3. Tan, RZ, Markus, C, Loh, TP. Impact of Delta check time intervals on error detection capability. Clin Chem Lab Med 2020;58:384–9. https://doi.org/10.1515/cclm-2019-1004.Search in Google Scholar PubMed

4. Randell, EW, Yenice, S. Delta checks in the clinical laboratory. Crit Rev Clin Lab Sci 2019;56:75–97. https://doi.org/10.1080/10408363.2018.1540536.Search in Google Scholar PubMed

5. Loh, TP, Tan, RZ, Sethi, SK, Lim, CY, Markus, C. Delta checks. Adv Clin Chem 2023;115:175–203. https://doi.org/10.1016/bs.acc.2023.03.005.Search in Google Scholar PubMed

6. Stevens, PE, Levin, A. Kidney disease: improving global outcomes chronic kidney disease guideline development work group members. Evaluation and management of chronic kidney disease: synopsis of the kidney disease: improving global outcomes 2012 clinical practice guideline. Ann Intern Med 2013;158:825–30. https://doi.org/10.7326/0003-4819-158-11-201306040-00007.Search in Google Scholar PubMed

7. Peiffer-Smadja, N, Rawson, TM, Ahmad, R, Buchard, A, Georgiou, P, Lescure, FX, et al.. Machine learning for clinical decision support in infectious diseases: a narrative review of current applications. Clin Microbiol Infect 2020;26:584–95. https://doi.org/10.1016/j.cmi.2019.09.009.Search in Google Scholar PubMed

8. Çubukçu, HC, Topcu, Dİ, Yenice, S. Machine learning-based clinical decision support using laboratory data. Clin Chem Lab Med 2024;62:793–823. https://doi.org/10.1515/cclm-2023-1037.Search in Google Scholar PubMed

9. Farrell, CJ. Identifying mislabelled samples: machine learning models exceed human performance. Ann Clin Biochem 2021;58:650–2. https://doi.org/10.1177/00045632211032991.Search in Google Scholar PubMed

10. Seok, HS, Yu, S, Shin, KH, Lee, W, Chun, S, Kim, S, et al.. Machine learning-based sample misidentification error detection in clinical laboratory tests: a retrospective multicenter study. Clin Chem 2024;70:1256–67. https://doi.org/10.1093/clinchem/hvae114.Search in Google Scholar PubMed

11. Seok, HS, Choi, Y, Yu, S, Shin, KH, Kim, S, Shin, H. Machine learning-based delta check method for detecting misidentification errors in tumor marker tests. Clin Chem Lab Med 2024;62:1421–32. https://doi.org/10.1515/cclm-2023-1185.Search in Google Scholar PubMed

12. Zhou, R, Liang, YF, Cheng, HL, Wang, W, Huang, DW, Wang, Z, et al.. A highly accurate delta check method using deep learning for detection of sample mix-up in the clinical laboratory. Clin Chem Lab Med 2022;60:1984–92. https://doi.org/10.1515/cclm-2021-1171.Search in Google Scholar PubMed

13. Farrell, CJ, Makuni, C, Keenan, A, Maeder, E, Davies, G, Giannoutsos, J. A machine learning model for the routine detection of «wrong blood in complete blood count tube» errors. Clin Chem 2023;69:1031–7. https://doi.org/10.1093/clinchem/hvad100.Search in Google Scholar PubMed

14. Farrell, CJL, Giannoutsos, J. Machine learning models outperform manual result review for the identification of wrong blood in tube errors in complete blood count results. Int J Lab Hematol 2022;44:497–503. https://doi.org/10.1111/ijlh.13820.Search in Google Scholar PubMed

15. Perez-Lebel, A, Varoquaux, G, Le Morvan, M, Josse, J, Poline, JB. Benchmarking missing-values approaches for predictive models on health databases. GigaScience 2022;11:giac013. https://doi.org/10.1093/gigascience/giac013.Search in Google Scholar PubMed PubMed Central

16. Chen, T, Guestrin, C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016: 785–94 pp. Available from: http://arxiv.org/abs/1603.02754.10.1145/2939672.2939785Search in Google Scholar

17. Bolton-Maggs, PHB, Wood, EM, Wiersum-Osselton, JC. Wrong blood in tube – potential for serious outcomes: can it be prevented? Br J Haematol 2015;168:3–13. https://doi.org/10.1111/bjh.13137.Search in Google Scholar PubMed

18. Hallworth, MJ. The «70% claim»: what is the evidence base? Ann Clin Biochem 2011;48:487–8. https://doi.org/10.1258/acb.2011.011177.Search in Google Scholar PubMed

19. Lippi, G, Chance, JJ, Church, S, Dazzi, P, Fontana, R, Giavarina, D, et al.. Preanalytical quality improvement: from dream to reality. Clin Chem Lab Med 2011;49:1113–26. https://doi.org/10.1515/cclm.2011.600.Search in Google Scholar PubMed

20. Graham, BV, Master, SR, Obstfeld, AE, Wilson, RB. A multianalyte machine learning model to detect wrong blood in complete blood count tube errors in a pediatric setting. Clin Chem 2025;71:418–27. https://doi.org/10.1093/clinchem/hvae210.Search in Google Scholar PubMed

21. Mitani, T, Doi, S, Yokota, S, Imai, T, Ohe, K. Highly accurate and explainable detection of specimen mix-up using a machine learning model. Clin Chem Lab Med 2020;58:375–83. https://doi.org/10.1515/cclm-2019-0534.Search in Google Scholar PubMed

22. Booth, C, Davies, P. Transfusion sample mislabelling and wrong blood in tube in the UK: insights from the national comparative audits of blood transfusion in 2012 and 2022. Transfus Med 2025;35:41–7. https://doi.org/10.1111/tme.13092.Search in Google Scholar PubMed

23. Strathmann, FG, Baird, GS, Hoffman, NG. Simulations of delta check rule performance to detect specimen mislabeling using historical laboratory data. Clin Chim Acta 2011;412:1973–7. https://doi.org/10.1016/j.cca.2011.07.007.Search in Google Scholar PubMed

24. Castro-Castro, MJ, Sánchez-Navarro, L. Estimation of change limits (deltacheck) in clinical laboratory. Adv Lab Med 2021;2:424–31. https://doi.org/10.1515/almed-2020-0114.Search in Google Scholar PubMed PubMed Central

25. Aarsand, AK, Webster, C, Fernandez-Calle, P, Jonker, N, Diaz-Garzon, J, Coskun, A, et al.. The EFLM biological variation database. Available from: https://biologicalvariation.eu/[cited2025Jul13.Search in Google Scholar

26. Pogorzelska, K, Krętowska, A, Krawczuk-Rybak, M, Sawicka-Żukowska, M. Characteristics of platelet indices and their prognostic significance in selected medical condition – a systematic review. Adv Med Sci 2020;65:310–5. https://doi.org/10.1016/j.advms.2020.05.002.Search in Google Scholar PubMed

27. Kralovcova, M, Müller, J, Hajsmanova, Z, Sigutova, P, Bultasova, L, Palatova, J, et al.. Understanding the value of monocyte distribution width in acutely ill medical patients presenting to the emergency department: a prospective single center evaluation. Sci Rep 2024;14:15255. https://doi.org/10.1038/s41598-024-65883-8.Search in Google Scholar PubMed PubMed Central

28. Passacquale, G, Vamadevan, P, Pereira, L, Hamid, C, Corrigall, V, Ferro, A. Monocyte-platelet interaction induces a pro-inflammatory phenotype in circulating monocytes. PLoS One 2011;6:e25595. https://doi.org/10.1371/journal.pone.0025595.Search in Google Scholar PubMed PubMed Central

29. Kashani, K, Rosner, MH, Ostermann, M. Creatinine: from physiology to clinical application. Eur J Intern Med 2020;72:9–14. https://doi.org/10.1016/j.ejim.2019.10.025.Search in Google Scholar PubMed

30. Ávila, M, Mora Sánchez, MG, Bernal Amador, AS, Paniagua, R. The metabolism of creatinine and its usefulness to evaluate kidney function and body composition in clinical practice. Biomolecules 2025;15:41. https://doi.org/10.3390/biom15010041.Search in Google Scholar PubMed PubMed Central


Supplementary Material

This article contains supplementary material (https://doi.org/10.1515/cclm-2025-0564).


Received: 2025-05-10
Accepted: 2025-10-12
Published Online: 2025-10-29
Published in Print: 2026-02-24

© 2025 Walter de Gruyter GmbH, Berlin/Boston

Articles in the same Issue

  1. Frontmatter
  2. Editorial
  3. The European Health Data Space: challenges and opportunities for laboratory medicine
  4. Review
  5. Saliva sampling matters for salivary diagnostics of viral infections
  6. Opinion Papers
  7. Mass spectrometry based precision diagnostics: on the cusp of selective testing
  8. Supporting trend detection in the cumulative display of electronic laboratory reports from multiple laboratories while preserving measurement provenance
  9. Secondary use of external quality assessment data – estimating inter-assay variation in LOINC-coded datasets
  10. Methodological evaluation and clinical interpretation of hs-cTnI and hs-cTnT variations: a reappraisal
  11. Point of care testing of biochemical markers for monitoring astronauts during long duration missions in deep space
  12. EFLM Paper
  13. Strategy of Laboratory Medicine – EFLM Vision
  14. Guidelines and Recommendations
  15. Recommendations for the study of monoclonal gammopathies in the clinical laboratory. A consensus of the Spanish Society of Laboratory Medicine and the Spanish Society of Hematology and Hemotherapy. Part III: Clinical and analytical recommendations for the study of monoclonal gammopathies by MALDI-TOF mass spectrometry
  16. Genetics and Molecular Diagnostics
  17. Optimization and comparison of genomic DNA extraction from whole blood collected in PAXgene blood RNA tube using automated platforms
  18. Small-scale external quality assessment of methylated SHOX2 and RASSF1A detection in China: findings from 2023–2024
  19. General Clinical Chemistry and Laboratory Medicine
  20. Development and validation of a machine learning model for accurate detection of wrong blood in tube errors in hospitalized patients
  21. Short- and medium-term pre-analytical stability of human serum insulin-like growth factor-1
  22. Creatinine measurement from finger stick dried blood spots with a routine chemistry analyzer for estimation of GFR
  23. Towards a global framework of entrustable professional activities for undergraduate clinical laboratory interns: a competency-based approach
  24. Spot urine is a suitable matrix for measurement of copeptin
  25. Reference Values and Biological Variations
  26. Use of indirect methods and machine learning algorithms for the estimation of reference intervals, taking cortisol measurements as an example
  27. ReferenceRangeR: a novel tool designed to facilitate reference interval estimation and verification
  28. VeRUS: verification of reference intervals based on the uncertainty of sampling
  29. Hematology and Coagulation
  30. Hemolysis index. Can we uncritically trust manufacturer declarations?
  31. Cancer Diagnostics
  32. Thresholds adjustments and impact on clinical performance of three FIT assays in a colorectal cancer screening program
  33. A robust ddPCR assay for the absolute quantification of miR-192-5p in hepatocellular carcinoma liquid biopsies
  34. Pro-gastrin releasing peptide as a tumor marker of medullary thyroid carcinoma: a comparative bivariate meta-analysis
  35. Infectious Diseases
  36. Prognostic utility of serial procalcitonin measurements in ICU sepsis: a laboratory-led modelling approach in a resource-limited setting procalcitonin modelling for sepsis mortality in a low-resource ICU
  37. Letters to the Editor
  38. Biological variation of serum transthyretin concentrations
  39. Pitfalls of immunoassays for diagnosis of hypoglycemia of undetermined etiology
  40. Cross reactivity of endogenous and exogenous 25-hydroxyvitamin D2 in commercial vitamin D assays; an evaluation using the Dutch external quality assessment scheme
  41. Interference with immunofixation in a dabigatran-overdosed patient treated with idarucizumab: beware of a diagnostic pitfall
  42. Potential savings of two practical interventions on vitamin D ordering practices at a large academic medical center
  43. Familial erythrocytosis and phenotypic heterogeneity associated with different defects in alpha globin genes: a significant new case of Hb Wroclaw (α88(F9) Ala>Glu; HBA1: c.266C>A)
  44. ROTEM-guided diagnostic pathway in a centralized Laboratory Medicine setting: an organizational report
Downloaded on 23.3.2026 from https://www.degruyterbrill.com/document/doi/10.1515/cclm-2025-0564/html
Scroll to top button