The role of AI in pre-analytical phase – use cases

Hikmet Can Çubukçu

doi:10.1515/cclm-2025-1220

Artikel Öffentlich zugänglich

The role of AI in pre-analytical phase – use cases

Hikmet Can Çubukçu

Veröffentlicht/Copyright: 16. Oktober 2025

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen

Aus der Zeitschrift Clinical Chemistry and Laboratory Medicine (CCLM)

Abstract

The pre-analytical phase of laboratory testing, encompassing processes from test ordering to sample analysis, represents the most error-prone component of laboratory medicine, accounting for 68–98 % of laboratory mistakes. These errors compromise patient safety, increase healthcare costs, and disrupt operational efficiency. Artificial intelligence (AI) and machine learning (ML) technologies have emerged as promising solutions to address these challenges across multiple pre-analytical applications. This narrative review examines current AI research applications and commercial implementations across seven key pre-analytical domains: clot detection, wrong blood in tube (WBIT) error detection, sample dilution management, chemical manipulation detection in urine samples, serum quality assessment based on hemolysis/icterus/lipemia (HIL), test utilization optimization, and automated tube handling. Research studies demonstrate impressive performance, with neural networks achieving accuracies exceeding 95 % for clot detection, XGBoost models reaching 98 % accuracy for WBIT detection, and deep learning systems attaining AUCs above 0.94 for test recommendation systems. However, a significant translation gap persists between research prototypes and commercial deployment. Academic models excel at pattern recognition using curated datasets but face limitations including single-center validation, retrospective designs, and integration challenges. Commercial solutions prioritize deterministic controls, barcoding, and sensor-based approaches that ensure reliability and scalability, with limited explicit AI implementation. Successful clinical laboratory translation requires multicenter prospective validation, robust laboratory information system integration, regulatory compliance frameworks, and evaluation metrics focused on operational outcomes rather than solely statistical performance. As infrastructure and standards mature, strategic AI adoption in pre-analytical tasks offers measurable improvements in safety, efficiency, and cost-effectiveness.

Keywords: pre-analytical phase; artificial intelligence; machine learning; laboratory medicine; pre-analytical errors; products

Introduction

The pre-analytical phase represents the foundational stage of total testing process that occurs prior to the analysis of the samples [1]. In the preanalytical phase of laboratory testing, several critical processes ensure the integrity and accuracy of a patient’s results. This phase begins with the clinician’s intent to order specific tests and the appropriateness of those requests [2]. Following this, patient preparation is essential, which includes ensuring proper fasting status, considering physical activity, and accounting for any therapeutic-drug intake [3]. Specimen collection is a vital process that involves correctly identifying the patient, labeling tubes accurately [4], using proper venipuncture techniques [5], and adhering to the correct order of draw [6]. Once collected, the sample handling and processing steps such as mixing and centrifugation are crucial for separating components like serum or plasma. From there, the sample transportation and storage must be carefully monitored to maintain specimen integrity by controlling factors like time, temperature, and agitation [7], 8]. Finally, an automated assessment of sample quality automatically checks for issues like hemolysis, icterus, lipemia (HIL) [9], clot detection, and correct fill volume [10].

The pre-analytical phase is a critical, yet vulnerable, part of the total testing process [11]. It is the most common source of laboratory errors, accounting for as much as 68–98 % of mistakes [12], 13]. These errors often stem from manual, labor-intensive tasks performed outside the direct supervision of laboratory staff [14]. Such errors can severely compromise patient safety by leading to misdiagnosis, delayed treatment, with patient misidentification being a particularly dangerous error [15]. This vulnerability also creates a significant financial burden on healthcare systems [12]. Furthermore, pre-analytical errors can disrupt operational efficiency, increase service costs, and undermine the integrity of data that clinicians rely on for critical medical decisions. The resulting lack of trust in laboratory results can erode the reputation of the entire healthcare process. To combat these challenges, significant efforts are underway to improve this phase through the adoption of advanced technology like automation [16], artificial intelligence (AI) and robotics for tasks such as automated labeling and vein detection [17].

AI, with its key subset machine learning (ML), is transforming laboratory medicine. This technology uses algorithms to analyze the vast, complex datasets generated by clinical laboratories, identifying patterns and making predictions far beyond what humans can conventionally analyze [18]. By leveraging this data, AI can optimize workflows and revolutionize the entire field.

This narrative review outlines current and emerging applications of AI in the pre-analytical phase of laboratory medicine. The focus is on AI use cases described in academic literature, while also highlighting comparable commercial solutions. The goal is to provide a comprehensive overview of the current market and the technologies being used to address these pre-analytical challenges. A domain-guided literature search was performed in the PubMed database utilizing combinations of keywords, including “pre-analytical”, “laboratory medicine”, “artificial intelligence”, “machine learning”, “clot detection”, “sample dilution”, “wrong blood in tube”, “hemolysis icterus lipemia”, “chemical manipulation in urine samples”, “test utilization”, and “tube handling”. The search was supplemented by reviewing studies recommended by PubMed. We prioritized the inclusion of peer-reviewed research that reported on data features, model performance metrics, and the specific algorithms employed.

While Lippi et al. (2024) recently provided a comprehensive overview of potential AI applications in the pre-analytical phase, their review focused primarily on theoretical opportunities and future possibilities [17]. The present narrative review advances this foundation by detailing synthesizes 22 studies in the literature – reporting the data used, ML methods, best model performance, explainability, limitations, and outcomes/merits – pairs academic AI/ML work with 30 commercial implementations and explicitly maps the translational gap between research advances and market deployment.

AI use cases in the pre-analytical phase: research vs. commercial implementation

The pre-analytical phase is rich with opportunities for AI to safeguard specimen integrity and streamline workflow – from detecting clots and wrong-blood-in-tube (WBIT) errors to managing dilutions, spotting chemically altered urine, grading serum quality (HIL), optimizing test utilization, and automating tube handling and labeling. Across research settings, ML and deep-learning models routinely report high accuracy and sensitivity, yet many remain constrained by single-center data, retrospective designs, and explainability or integration hurdles. In contrast, commercially deployed systems largely rely on sensors, barcoding, middleware rules, and deterministic checks; only a few explicitly incorporate AI despite achieving strong operational reliability at scale. This section pairs state-of-the-art research with current market solutions for the use-cases in the pre-analytical phase.

Clot detection

Research applications

The detection of clotted specimens represents a critical quality control measure in laboratory medicine, as clotted samples can lead to inaccurate coagulation test results and potential analytical errors. Two distinct studies demonstrate the potential of AI in clot detection (Table 1). Fang et al. employed a neural network to identify clotted specimens in coagulation testing based on laboratory parameters such as thrombin time, fibrinogen, partial thromboplastin time, prothrombin time, and D-dimer [19]. The model showed excellent performance (AUC 0.97, accuracy 95.3 %, specificity 96.7 %, sensitivity 94.0 %), with logistic regression coefficients provided for partial interpretability, though noted limitations regarding age disparities in the data [19]. Complementarily, Hou et al. developed an improved UNeXt segmentation network to detect clots and fibrins in serum images, addressing the operational risk of sampling needle blockage [20]. Trained on 9,500 serum sample images and validated on 13,230 real-world cases, the model demonstrated sensitivity of 95.74 %, specificity of 98.11 %, and accuracy of 97.93 % [20].

Table 1:

Use cases of AI in the pre-analytical phase reported in the literature.

Study	Aim	Data	ML methods	Best model’s performance	Explainability	Limitations	Merits/outcome
Clot detection

Fang et al. [19]	To identify clotted specimens using coagulation test results	Total: 3,081 specimens (192 clotted, 2,889 no-clot); input variables: TT, Fbg, PTT, PT, D-dimer results, and labels about the presence of clot; training/test split not specified	Neural network	Predicted feature: Clot presence (binary: Clot/no-clot); No performance data on test dataset – reported performance based on overall dataset: AUC: 0.97, accuracy: 95.3 %, specificity: 96.7 %, sensitivity: 94.0 %	Logistic regression coefficients were provided	Disparity in ages in the data	Potential for identifying clotted samples using coagulation test results
Hou et al. [20]	To propose and evaluate a serum image blood clot and fibrin segmentation method based on an improved UNeXt network	Training: 9,500 samples (7:2:1 ratio=∼6,650 training, ∼1,900 validation, ∼950 internal test); external validation: 13,230 clinical samples; input variables: Real serum sample images (512 × 1,024 pixels) collected from AutoLas X−1 series with labeled masks for clots and fibrins	Improved UNeXt segmentation network with PyTorch	Predicted feature: Segmentation masks for blood clots and fibrin locations in serum images; external validation dataset (13,230 clinical samples): Sensitivity 95.74 %, specificity 98.11 %, accuracy 97.93 %; internal test dataset: Average dice coefficient 0.8707 (87.07 %)	Not explicitly discussed	Not explicitly stated	Improved UNeXt model offers better segmentation performance with minimal sacrifice in computational efficiency

Specimen mix-up detection

Farrell et al. [21]	To develop a machine learning model for routine detection of “wrong blood in complete blood count tube” errors	Training: 135,128 CBC results (50 % assigned simulated WBIT errors); external prospective validation: 38,187 routine CBC results over 22 weeks; input variables: 14 CBC parameters, patient age, sex, and medical record number	Extreme gradient boosting algorithm	Predicted feature: WBIT error probability (binary: WBIT error/no error); external prospective dataset (38,187 samples): Detected 12 WBIT errors, PPV 10.9 %; internal test dataset: Accuracy 98.2 %, sensitivity 96.8 %, specificity 99.6 %	Not explicitly discussed	Requires recent previous CBC results within 6 days; model does not provide WBIT detection for transfusion patients	Model suitable for routine use; identified WBIT errors missed by laboratory procedures
Graham et al. [22]	To assess performance of multianalyte ML models in detecting WBIT errors in pediatric setting	Total: 123,654 patients; training/test split not explicitly specified (50 % assigned simulated WBIT errors); input variables: CBC with differential and CBC without differential white cell count tests, previous CBC within 7 days	Extreme gradient boosting (XGBoost)	Predicted feature: WBIT error probability (binary: WBIT error/no error); internal test dataset: CBC with diff – accuracy 0.9715, AUROC 0.9962; CBC without diff – accuracy 0.9647, AUROC 0.9944	Model-based feature importance showed most top 10 features were red cell indices	Inability to verify prospective WBITs; simulation may not match reality	ML models outperformed single-analyte delta checks; provides reproducible model development pipeline
Farrell and giannoutsos [23]	To develop and evaluate multiple ML models for detecting WBIT errors in CBC results	Total: 112,321 samples; split into training, development, and test sets (proportions not specified); input variables: Current and previous CBC results for specific analytes, patient age, sex, and collection location	Eight ML models: ANN, extreme gradient boosting, SVM, random forest, logistic regression, decision trees, k-nearest neighbors	Predicted feature: WBIT error probability (binary: WBIT error/no error); internal test dataset: Artificial neural network – accuracy 99.1 %, sensitivity 99.2 %, specificity 98.9 %	Simple decision tree developed with interpretable rules using four CBC parameters	Low real-world prevalence affects PPV; models might flag post-transfusion samples	All ML models exceeded manual review performance; provides preliminary evidence for ML value in WBIT detection
Farrell [24]	To identify mislabelled samples	Total: 127,256 sets of consecutive results; training/test split not specified; input variables: Patient age, sex, current and previous results for sodium, potassium, chloride, bicarbonate, urea and creatinine, date and time of collection	Eight ML models including ANN, extreme gradient boosting, SVM, random forest, logistic regression, k-nearest neighbors, decision trees	Predicted feature: Sample mislabeling probability (binary: Mislabeled/correctly labeled); No performance data on test dataset – reported performance based on overall analysis: Artificial neural network - 92.1 % accuracy, AUC 0.977	None	Randomly introduced labelling errors, omitting non-random errors	ML algorithms exceeded human-level performance for identifying mislabelled samples
Farrell [25]	To detect specimen mix-ups	Total: 141,396 EUC results; training: 80 %, development: 10 %, internal test: 10 %; input variables: Age, sex, current and previous EUC results, absolute and percentage delta values for each analyte	ANN	Predicted feature: Specimen mix-up probability (binary: Mix-up/no mix-up); internal test dataset (10 % of data): Accuracy 92.5 %, sensitivity 90.6 %, specificity 94.5 %, AUC 0.980	Partially	Randomly introduced labelling errors, omitting non-random errors	Human interaction with AI models reduced their performance
Mitani et al. [26]	To detect specimen mix-ups	Total: 2,159,354 records; training/test split not specified; input variables: Complete blood cell counts and biochemical tests, differences between consecutive results	Gradient-boosting-decision-tree (GBDT)	Predicted feature: Specimen mix-up probability (binary: Mix-up/no mix-up); No performance data on test dataset – reported performance based on overall dataset: AUC 0.998	Reported as SHAP values	Simulation of mix-up, single center study, no external validation	ML model performs efficient specimen mix-up detection
Rosenbaum & Baron [27]	To detect wrong blood in tube (WBIT) errors	Total: 20,638 patient results; training/test split not specified; input variables: 11 clinical chemistry analytes, absolute changes, velocity	Logistic regression, support vector machine	Predicted feature: WBIT error probability (binary: WBIT error/no error); No performance data on test dataset – reported performance based on overall analysis: Support vector machine AUC 0.97	Partially (as univariate delta check predictive power)	No external validation	ML-based multianalyte delta check algorithm superior to conventional single-analyte delta checks
Zhou et al. [28]	To detect sample mix-up using delta check method based deep learning	Total: 423,290 hematology test results; training/test split not specified; input variables: Hematology test results with delta check calculations	Deep Belief network (DBN), random forest, SVM, logistic regression, KNN, naive Bayesian classifier	Predicted feature: Sample mix-up probability (binary: Mix-up/no mix-up); No performance data on test dataset – reported performance based on overall analysis: DC method based on DBN – AUC 0.977, accuracy 93.1 %, TPR 92.9 %, TNR 93.3 %	None	Lack of explainability	DC method based on DBN outperformed RCV and empirical delta check for specimen mix-up detection

Sample dilution management

Ialongo et al. [29]	To manage sample dilution of serum free light chain (sFLC) testing	Total: 6,099 database entries; training/test split not specified; input variables: sFLC results, dilution status, patients’ hospital status	Artificial neural network based on multi-layer perceptron (MLP-ANN)	Predicted feature: Dilution requirement (binary: Needs dilution/no dilution needed); No performance data on test dataset – reported outcome based on overall analysis: Reduced κ-FLC wasted tests by 69.4 % and λ-FLC wasted tests by 70.8 %	Feature importance were assessed	ANN model unable to recognize some particular cases, no external validation	MLP-ANN reduced number of sFLC testing regarding dilution

Chemical manipulation detection

Streun et al. [30]	To design ANN to detect chemically adulterated urine samples using LC-HRMS data	Total: 702 samples; training: 500 samples, internal test: 202 samples; input variables: 33,448 features (compound ions) from LC-HRMS data after retention time alignment and peak picking	Fully connected artificial neural network	Predicted feature: Chemical adulteration status (binary: Adulterated/non-adulterated); internal test dataset (n=202): Accuracy 95.4 %; 10-fold cross-validation on training: Accuracy 90.4 %, sensitivity 88.9 %, specificity 92.0 %	LIME analysis extracted 14 important features as potential biomarkers	Sample adulteration at room temperature vs. authentic ∼37 °C; lack of different adulterant concentrations	Reliable, high-performance, and interpretable ANN model for detecting chemical urine manipulation

Serum quality assessment

Yang et al. [31]	To develop deep-learning based model to evaluate serum quality using sample images	Total: 16,427 centrifuged blood images; training/test split not specified; input variables: Centrifuged blood images with known serum indices values (hemolytic index, icteric index, and lipemic index)	Convolutional neural networks (inception-resnet-V2 network)	Predicted feature: HIL classification (multiclass: Hemolysis, icterus, lipemia subclassification); No performance data on test dataset – reported performance based on overall analysis: For subclassification of hemolysis, icterus, and lipemia – AUCs 0.989, 0.996, and 0.993	None	No external validation	Deep learning model for automated assessment of serum quality
Man et al. [32]	To develop model for robust recognition of hemolysis, icterus, and lipemia (HIL) in serum samples	Training: 20,000 samples (10,000 normal, 10,000 HIL); validation: 36,262 routine data; external prospective test: 112,121 real-world data over 4 months; input variables: Serum images resized to 32 × 32 pixels with normalization, labeled based on established HIL cutoff values	Binary classification model with ResNet18 backbone	Predicted feature: HIL status (binary: HIL present/normal); external prospective test dataset (112,121 samples): Overall FNR of 2.56 % with pass rate of 86.48 %; validation dataset (36,262 samples): FNR of 1.39 %	Not discussed	Lack of standardized cutoff values for HIL; single-site data limits generalizability	Streamlined model with enhanced preprocessing reliably identifies serum quality
Benirschke and gniadek [33]	To create logistic regression model to predict falsely elevated POC potassium results due to hemolysis	Total: 3,489 unique encounters; training/test split not explicitly specified; external validation: 496 POC BMPs from December 2019; input variables: POC BMP results (na, K, cl, CO2, BUN, creat, iCa), age, sex, specimen collection date/time	Logistic regression	Predicted feature: False K elevation probability (binary: Falsely elevated K/true elevation); internal test dataset: AUC 0.995; rule-in cutoff – PPV 90.9 %, NPV 99.0 %; rule-out cutoff – NPV 99.9 %, PPV 61.5 %; external validation (496 POC BMPs): Rule-in - accuracy 99.0 %, sensitivity 82.4 %, specificity 99.5 %	Patterns biologically explainable: Elevated creatinine, low CO2, increasing iCa associated with true K elevations	No clinical confirmation of discrepancy causes; single center study	Novel algorithm to detect pseudohyperkalemia at POC with high accuracy

Test utilization/ordering

Zhang et al. [34]	To improve PBFC test utilization	Total: 784 PBFC samples; training/test split not specified; input variables: History of hematological malignancy, CBC/differential parameters	Decision tree, logistic regression model	Predicted feature: PBFC test indication (binary: Indicated/not indicated); No performance data on test dataset – reported performance based on overall analysis: Decision tree – 98 % sensitivity, 65 % specificity (AUC=0.906); logistic regression – 100 % sensitivity, 54 % specificity (AUC=0.919)	Decision tree and differences between groups were reported	Small sample size, no external validation	ML models decreased unnecessary PBFC utilization by 35–40 %
Islam et al. [35]	To develop deep learning-based automated system to recommend appropriate laboratory tests	Total: 1,463,837 prescriptions from 530,050 patients; training: 70 %, internal validation: 20 % of training, internal test: 30 %; input variables: Patients’ demographics, medications, diseases (ICD-9-CM codes), and laboratory information for 315 different laboratory tests	Deep neural network (DNN) with three hidden layers	Predicted feature: Laboratory test recommendations (multi-label: 315 different laboratory tests); internal test dataset (30 % of data): AUROC 0.98 (micro-average) and 0.94 (macro-average); high recall (0.96) at low cut-off	DL model lacks interpretability (“black box”)	Did not consider temporal dimension; no external validation	Successfully developed DNN model for recommending personalized laboratory tests
Islam et al. [36]	To develop AI-based automated model for laboratory test recommendation based on EHR variables	Total: Taiwanese NHIRD cardiology data; training: 80 %, internal test: 20 % (with 25 % of training for internal validation); input variables: Gender, age, disease (first 3 digits of ICD-9-CM), and drug information (first 5 characters of ATC code) for 35 laboratory tests	Deep learning (DL) model, DNN with three hidden layers	Predicted feature: Laboratory test recommendations (multi-label: 35 laboratory tests); internal test dataset (20 % of data): AUROCmacro 0.76, AUROCmicro 0.87; 99 % sensitivity at low cutoff	Not extensively discussed	Data from only cardiology department; only 35 laboratory tests included	First study to evaluate DL model for lab test recommendation using EHR variables
McDermott et al. [37]	To propose ML-based approach to “smart” reflex testing using ferritin as example	Total: 288,427 unique “CBC-events” from 137,451 patients; cross-validation splits not explicitly specified; input variables: CBC results, other laboratory test results, age, and gender, historical lab results up to 2–3 years prior	Logistic regression and random forest algorithms	Predicted feature: Ferritin test ordering probability (binary: Ferritin test indicated/not indicated); No specific test dataset performance reported - cross-validation results: Random forest – overall AUC 0.731, AUPRC 0.349, Brier loss 0.094	Feature importance showed clinically relevant predictors for iron deficiency	Subjectivity of chart review; single center study	Proof-of-concept for ML-based “smart” reflex protocol with potential improved performance over rule-based approaches
McDermott et al. [37]	To propose ML-based approach to “smart” reflex testing using ferritin as example	Total: 288,427 unique “CBC-events” from 137,451 patients; cross-validation splits not explicitly specified; input variables: CBC results, other laboratory test results, age, and gender, historical lab results up to 2–3 years prior	Logistic regression and random forest algorithms	No specific test dataset performance reported - cross-validation results: Random forest – overall AUC 0.731, AUPRC 0.349, Brier loss 0.094	Feature importance showed clinically relevant predictors for iron deficiency	Subjectivity of chart review; single center study	Proof-of-concept for ML-based “smart” reflex protocol with potential improved performance over rule-based approaches

Tube handling

Demirci [38]	To assess operational performance of NESLI, AI-based tube-labeling robot	Total: 330 synthetic orders based on real-world patterns; simulation-based testing (no traditional training/test split); input variables: Images from overhead camera for tube position determination, tube cap color classification	Two convolutional deep learning artificial neural networks	Predicted feature: Tube position coordinates and tube type classification (multiclass: Tube position+cap color identification); simulation-based testing: 99.2 % success rate in labeling parameters; 100 % success in seven other labeling quality parameters; median labeling time 8.96 s per tube	Not explicitly discussed	Limited tube store capacity (204 tubes); temporary network interruptions	AI-based robotic systems significantly increase process efficiency and reduce errors in preanalytical phase
Şişman et al. [39]	To test performance of KANKA, AI-based robot for sorting blood tubes and preanalytical quality control	Total: 1,000 blood tubes × 5 trials=5,000 trials; experimental testing (no traditional training/test split described); input variables: Tube position, tube type and blood volume assessment, filling level determination, barcode scanning	Three convolutional deep learning models plus barcode scanning model	Predicted feature: Multiple outputs – tube position coordinates, tube type classification, filling level assessment, preanalytical error detection (multiclass: Various preanalytical error types); experimental test dataset (5,000 trials): Overall accuracy 99.98 %; 100 % success rate in detecting preanalytical errors; 311 tubes processed per hour	AI results can be confirmed through stored sample images	Preliminary study; hemolysis not included; pediatric tubes not included	High accuracy in detecting preanalytical errors; potential labor savings

ANN, artificial neural network; AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve; AUC, area under the curve; BMP, basic metabolic panel; CBC, complete blood count; CNN, convolutional neural network; DBN, deep belief network; Diff, differential white cell count; DL, deep learning; DNN, deep neural network; EHR, electronic health record; EUC, electrolytes, urea and creatinine; Fbg, fibrinogen; FNR, false negative rate; GBDT, gradient-boosting decision tree; HIL, hemolysis, icterus, and lipemia; iCa, ionized calcium; K, potassium; KNN, k-nearest neighbors; κ-FLC, serum kappa free light chain; λ-FLC, serum lambda free light chain; LC-HRMS, liquid chromatography coupled to high-resolution mass spectrometry; LIME, local interpretable model-agnostic explanations; MGH, massachusetts general hospital; ML, machine learning; MLP, multi-layer perceptron; NHIRD, national health insurance research and development (database); NPV, negative predictive value; PBFC, peripheral blood flow cytometry; POC, point-of-care; PPV, positive predictive value; PT, prothrombin time; PTT, partial thromboplastin time; RCV, reference change value; sFLC, serum free light chain; SHAP, shapley additive explanations; SVM, support vector machine; TNR, true negative rate; TPR, true positive rate; TT, thrombin time; WBIT, wrong blood in tube.

Commercial solutions

Several commercial products, such as the Abbott i2000SR, its successor the ci8200, the Roche cobas 8100, and the Tecan PMP, incorporate clot detection technology to ensure sample integrity (Table 2). These systems use different methods to detect clots: the Abbott models use probe-based detection, the Roche cobas 8100 employs module-based multi-parameter checks, and the Tecan PMP utilizes pressure-profile monitoring. However, none of these manufacturers explicitly state that their systems use AI or ML for clot detection.

Table 2:

Comparable commercial solutions for preanalytical phase.

Product name & manufacturer	Key features
Clot detection

ARCHITECT i2000SR (Abbott)	Clot/gel/bubble detection at the sample probe, reagent pressure monitoring, and probe-based detection with pressure monitoring technology
ARCHITECT ci8200 (Abbott)	Integrated system (c8000+i2000SR) with automated clot/bubble detection during sampling, reagent pressure monitoring, and probe-level detection capabilities
cobas 8100 series (Roche)	Multi-parameter sample integrity checks including tube-type matching, sample check module for liquid-level/volume and serum indices (HIL), and aliquot module with LLD and clot detection
Freedom EVO with PMP (Tecan)	Pressure monitored pipetting (PMP) with real-time pressure-profile monitoring to flag aspiration/dispense errors including clots, particles, and air bubbles

Specimen mix-up detection

MobiLab (Iatric Systems)	Bedside barcode specimen collection system with patient wristband scanning, on-spot label printing, and barcode verification for positive patient identification
BloodTrack tx (Haemonetics)	Bedside transfusion management and sample collection software with 2D barcode patient ID enforcement and verification at collection and transfusion
Zebra barcode solution (Zebra Technologies)	Hardware/software solution with barcode printers, scanners, mobile computers for specimen tracking, and automated barcode data capture systems
LigoLab LIS Tracking (LigoLab LLC)	Advanced laboratory information system with end-to-end specimen tracking, barcode verification, and unique sample IDs with barcode scans at each handling step

Sample dilution management

Alinity Automation System (Abbott)	Middleware-driven intelligent sample handling rules with automated frontline dilution capabilities, proactive dilution rules that automatically trigger preset dilutions before analysis when tests are likely to exceed analytical range
Atellica^®+Aptio Automation (Siemens Healthineers) DataLink V2.0 (Siemens Healthineers)	DataLink™ middleware with AI-assisted intelligent dilution logic, rule-based automated sample dilutions with preset dilution factor programs that leverage patient data to automatically assign optimal dilution factors before analysis
DxA 5000 Total Lab Automation (Beckman Coulter)	Remisol advance middleware with comprehensive volume monitoring and intelligent routing, tube inspection unit with volume measurement checks, automatic routing for reruns/dilutions, and volume verification to prevent quantity not sufficient errors

Chemical manipulation detection (urine)

Intect 7 Test Strips (Branan Medical Corp/TriTech Forensics)	7-Parameter urine adulteration detection strips using colorimetric analysis with chemical reaction-based color change detection for identifying adulterants through visual comparison with color charts after 60-s reaction time
AdultaCheck 6 (Chimera Research and Chemical Inc.)	Multi-parameter adulteration detection strips using colorimetric methodology with chemical indicator-based detection system that changes color in presence of specific adulterants through biochemical reactions
UrineCheck 7 (AlcoPro/TransMed)	7-Parameter dip-and-read test strips with rapid detection capability using chemical reaction-based colorimetric detection system providing results through visual color comparison within 1 min
MASK ultra screen (Kacey Medical)	Multi-parameter adulterant detection device using colorimetric analysis with chemical indicator-based detection system for identifying adulterants through color change reactions

Serum quality assessment

cobas chemistry analyzers (Roche Diagnostics)	Automated spectral hemolysis/icterus/lipemia index measurement built into analyzer for pre-analytical checks, flags samples with HIL interference before and during testing
Alinity & ARCHITECT (Abbott)	Integrated HIL index assays on chemistry analyzers for automatic interference detection, with alinity c capable of quantifying hemolysis through free hemoglobin index measurement
AU/DxC series (Beckman Coulter)	Automatic HIL indices generation on general chemistry analyzers (AU5800, DxC 700) with clot detection capabilities to catch clots in serum samples
Dimension/Atellica (Siemens Healthineers)	Spectrophotometric serum index checks on analyzers (dimension RxL, Vista, atellica CH series) for HIL interference detection, fully automated in routine sample processing
Vitros Systems (Ortho Clinical Diagnostics)	Interference index detection on Vitros dry-chemistry analyzers with automated identification of hemolyzed, turbid, or icteric samples

Test utilization & ordering

CareSelect Lab (Optum)	EHR-integrated clinical decision support that silently audits and scores every lab order in real-time against mayo clinic evidence-based guidelines, uses automated scoring algorithms to evaluate test appropriateness without workflow disruption, provides analytics to identify misutilization patterns and offers point-of-order intervention tools
Laboratory Decision System (LDS) (Medical Database, Inc.)	Stand-alone/integrable decision support system with evidence-based recommendations for 2,300+ tests across 850+ diseases, uses proprietary algorithms to provide relevancy scores (0–10) and real-time medical necessity checking, ensures correct ICD-10/CPT coding and suggests alternatives to meet payer criteria
Viewics Dx optimization (Roche)	Analytics module for retrospective utilization review using lab data to identify overutilization patterns, employs data analytics algorithms to spot redundant tests and low-value testing, provides dashboards for policy changes and provider education based on testing patterns
hc1 PrecisionDx Advisor/Quest^® Lab Stewardship™	Cloud/browser-based lab stewardship platform that aggregates data across multiple lab and IT systems, uses machine learning to flag misutilization and care variation, provides real-time dashboards and guideline-based signals to identify duplicate or unnecessary testing

Specimen handling

Cobas p612 (Roche)	Automates tube registration, sorting, centrifugation, decapping, aliquoting, and recapping with optical camera module for sample integrity checks (detects hemolyzed, lipemic, icteric samples) and laser-based liquid level detection to measure blood volume in each tube
DxA 5000 (Beckman Coulter)	Provides end-to-end automation with intelligent sample routing and inspection, performs automatic pre-analytical checks for tube type correctness, fill levels (short draws), barcode/label verification, integrates decapping, sorting, and connection to analyzers for seamless workflow
Atellica^® Sample Handler (Siemens)	Robotic sample handler with multi-camera 360° barcode scanning (reads tube barcodes from any orientation), automates decapping/capping, sample sorting and routing, refrigerated archiving, and performs automatic volume checks on tubes to detect under-filled samples
TCAutomation (Thermo Scientific)	Modular system that automates pre- and post-analytical steps including tube loading, centrifugation, decapping, aliquoting into secondary tubes, labeling of aliquots, sorting to analyzers, and controlled storage/retrieval with open connectivity and scalable configuration
Sci-Print VX2 (Computype/Scinomix)	Print-and-apply labeling system for tubes/vials that automatically wraps labels on tubes of 0.5–50 mL size with precise placement, supports integrated barcode scanning and optional capping/uncapping for benchtop labeling of batches
ProTube™ Station (Inpeco)	Guided blood collection station that automates patient ID verification, tube selection, and on-demand tube labeling at the draw site with cap color and size recognition, provides prompts for order-of-draw and tube inversion with full traceability recording

EHR, electronic health record; HIL, hemolysis, icterus, lipemia; ICD-10, International Classification of Diseases, 10th Revision; CPT, current procedural terminology; LIS, Laboratory Information System; LLD, liquid level detection; PMP, pressure monitored pipetting; AI, artificial intelligence; ID, identification; 2D, two-dimensional. This Table covers product capabilities based on solely available manufacturer documentations.

Wrong blood in tube error detection

Research applications

Wrong blood in tube (WBIT) errors represent one of the most critical preanalytical challenges in laboratory medicine, with potentially severe clinical consequences for patient safety. Numerous studies have investigated ML models for detecting WBIT errors (Table 1).

Decision tree-based ML algorithms showed promise in detecting WBIT errors. Farrell et al. developed an XGBoost model using CBC data [21]. When applied to retrospective test data, the model demonstrated accuracy of 98.2 %, sensitivity of 96.8 %, and specificity of 99.6 %. In prospective evaluation of 38,187 routine CBC results over 22 weeks, the model detected 12 WBIT errors missed by routine checks [21]. Likewise, Mitani et al. developed a gradient-boosting decision tree approach using an extensive dataset of 2,159,354 records encompassing complete blood cell counts and biochemical tests, achieving an AUC of 0.998 [26]. However, the study limitations included reliance on simulated mix-up scenarios and single-center validation without external confirmation [26]. Graham et al. focused on a pediatric setting and compared multiple ML models using CBC data from 123,654 patients, with XGBoost again performing best (accuracy of 97.15 %, AUROC of 0.996, sensitivity of 96.90 %, and specificity of 97.39 % for CBC with differentials) [22]. Feature importance analysis revealed that red cell indices consistently ranked among the top 10 predictive features [22].

Comparative analyses of multiple ML architectures have established artificial neural networks as particularly effective for WBIT detection tasks. Farrell and Giannoutsos systematically evaluated eight different ML models using 112,321 samples from a public hospital, including comprehensive CBC parameters, patient demographics, and collection location data [23]. Their artificial neural network achieved superior performance with accuracy of 99.1 %, sensitivity of 99.2 %, and specificity of 98.9 %, consistently outperforming all alternative approaches and exceeding manual review capabilities by laboratory staff. The study’s development of an interpretable simple decision tree using only four CBC parameters – red cell distribution width percentage delta change, mean cell volume percentage delta change, platelet distribution width percentage delta change, and mean cell hemoglobin percentage delta change – provided an implementable solution for existing laboratory information systems [23]. The broader applications of neural networks for specimen identification have been demonstrated across multiple laboratory parameters. Farrell’s investigation of mislabeled sample detection using 127,256 sets of consecutive electrolyte results achieved 92.1 % accuracy with an AUC of 0.977 [24], while a separate study by Farrell [25] utilizing 141,396 sets of electrolyte, urea, and creatinine results demonstrated that artificial neural networks could achieve accuracy of 92.5 %, sensitivity of 90.6 %, and specificity of 94.5 %. Notably, this latter investigation revealed that autonomous AI operation significantly outperformed human-supervised decision support modes, with the autonomous model achieving an AUC of 0.980 compared to reduced performance when human interaction was incorporated. Similarly, deep learning architectures have established new performance benchmarks for specimen mix-up detection through advanced neural network designs. Zhou et al. implemented Deep Belief Network-based delta check methods using 423,290 hematology test results, achieving an AUC of 0.977, accuracy of 93.1 %, true positive rate of 92.9 %, and true negative rate of 93.3 % [28]. Their approach significantly outperformed both reference change value methods and empirical delta checks, though the lack of model explainability was noted as a limitation for clinical implementation [28].

Commercial solutions

Several commercial solutions are available to prevent or detect wrong blood in tube (WBIT) errors and specimen mix-ups during the pre-analytical phase (Table 2). These tools primarily use barcoding, automation, and informatics to ensure that a patient’s sample is correctly labeled and handled. However, none of these uses AI or ML technologies.

Sample dilution management

Research applications

AI applications in sample dilution management have demonstrated significant potential for optimizing laboratory workflows while reducing resource wastage and improving operational efficiency (Table 1). Ialongo et al. developed a multi-layer perceptron artificial neural network (MLP-ANN) specifically designed to manage sample dilution for serum free light chain testing, addressing the complex decision-making processes required for optimal analytical protocols [29]. Using a comprehensive database of 6,099 entries that included serum free light chain results, dilution status, and patients’ hospital status, their MLP-ANN achieved substantial reductions in wasted tests, decreasing unnecessary serum kappa free light chain testing by 69.4 % and serum lambda free light chain testing by 70.8 % [29].

Commercial solutions

Leading commercial laboratory automation systems from Abbott, Siemens, and Beckman Coulter offer distinct approaches to managing sample dilution, each aiming to improve pre-analytical efficiency and accuracy (Table 2). For example, Abbott’s Alinity system proactively dilutes samples predicted to exceed the analytical range through its middleware, which helps prevent repeat tests and reduces turnaround times. Siemens’ DataLink V2.0 system takes a different approach by incorporating patient-specific clinical data, such as gestational age, to make context-aware dilution decisions. Notably, this system explicitly claims to include AI-assisted intelligent dilution logic that leverages patient data to automate these decisions. However, the system mainly uses rule-based logic rather than a sophisticated AI system [40]. In contrast, Beckman Coulter’s DxA 5000 system employs a comprehensive pre-analytical strategy, prioritizing volume monitoring and quality checks to ensure a sample is adequate before any dilution is attempted.

Detecting chemical manipulation in urine samples

Research applications

The detection of chemical adulteration in urine samples represents a critical application domain where ML has shown exceptional promise for forensic toxicology and workplace drug testing programs (Table 1). Streun et al. created an ML framework using liquid chromatography coupled to high-resolution mass spectrometry data from 702 authentic human urine samples treated with five different chemical adulterants: pyridinium chlorochromate, potassium nitrite, hydrogen peroxide, iodine, and sodium hypochlorite, with water serving as the control condition [30]. Their artificial neural network achieved accuracy of 90.4 %, sensitivity of 88.9 %, specificity of 92.0 %, positive predictive value of 91.9 %, negative predictive value of 89.2 %, and area under the curve of 0.99. When validated on an independent test set of 202 samples comprising both untreated samples from their study and treated samples from previous research at varying adulterant concentrations, the model achieved 95.4 % accuracy, demonstrating robust performance across different experimental conditions [30].

Commercial solutions

Commercial products designed to detect chemical manipulation in urine samples rely on colorimetric detection technology (Table 2). The underlying principle involves using chemical indicator reactions that cause a visible color change when specific adulterants are present in the specimen. Currently, none of these products explicitly employ AI or ML technologies.

Assessing serum quality based on hemolysis, icterus, and lipemia

Research applications

The automated assessment of serum quality through AI represents a fundamental advancement in laboratory quality control, addressing the critical need for rapid and accurate detection of hemolysis, icterus, and lipemia interference that can significantly compromise analytical accuracy.

Deep learning architectures have demonstrated exceptional capability for serum quality assessment through advanced image analysis techniques (Table 1). Yang et al. developed a deep learning system using convolutional neural networks based on the Inception-Resnet-V2 architecture to evaluate serum quality from sample images [31]. Their extensive dataset encompassed 16,427 centrifuged blood images with established serum indices values for hemolytic, icteric, and lipemic indices. The model achieved outstanding performance for subclassification tasks, with area under the curve values of 0.989 for hemolysis detection, 0.996 for icterus identification, and 0.993 for lipemia assessment, establishing new benchmarks for automated serum quality evaluation [31]. Similarly, Man et al. developed a deep learning model using ResNet18 architecture for classification of serum samples to detect hemolysis, icterus, and lipemia (HIL) interference, utilizing over 168,000 serum images [32]. The model achieved high reliability with a false negative rate of 2.56 % and significantly improved efficiency with an 86.48 % pass rate (no need for serum index testing) during four months of prospective real-world testing. This approach provides a practical solution for addressing pre-analytical errors in clinical laboratories by reliably identifying HIL samples while maintaining high throughput and reducing reagent costs [32].

Predictive analytics for specific analytical interferences have shown remarkable success in point-of-care testing environments. Benirschke and Gniadek developed a logistic regression model specifically designed to predict falsely elevated point-of-care whole-blood potassium results caused by hemolysis, using data from 3,489 unique patients [33]. Their model achieved an exceptional area under the curve of 0.995, with optimization strategies employing both rule-in and rule-out cutoffs to maximize clinical utility. The rule-in cutoff yielded a positive predictive value of 90.9 % and negative predictive value of 99.0 %, with overall accuracy of 98.9 %, sensitivity of 58.8 %, and specificity of 99.9 %. Conversely, the rule-out cutoff achieved a negative predictive value of 99.9 % and positive predictive value of 61.5 %, with accuracy of 98.5 %, sensitivity of 94.1 %, and specificity of 98.6 %. The model’s biological explainability represented a significant advantage, with elevated creatinine, low carbon dioxide, and increasing ionized calcium consistently associated with true potassium elevations, patterns consistent with renal disease pathophysiology. External validation using 496 point-of-care basic metabolic panels confirmed robust performance, with rule-in cutoff achieving accuracy of 99.0 %, sensitivity of 82.4 %, and specificity of 99.5 % [33].

Commercial solutions

In modern clinical laboratories, automated assessment of hemolysis, icterus, and lipemia (HIL) indices is a key tool for ensuring sample quality before analysis (Table 2). These HIL checks provide a rapid and fully automated way to flag compromised samples before they are tested while none of these well-known systems utilizes AI.

Improving test utilization, test ordering

Research applications

ML applications in laboratory test utilization and ordering optimization represent a transformative approach to reducing healthcare costs while improving diagnostic efficiency and clinical decision-making (Table 1).

Specialized test utilization optimization has shown remarkable success in reducing unnecessary procedures while maintaining diagnostic sensitivity. Zhang et al. developed ML models to improve peripheral blood flow cytometry (PBFC) test utilization using data from 784 PBFC samples that included patient history of hematological malignancy and complete blood count differential parameters [34]. Their comparative analysis of decision tree and logistic regression models revealed complementary strengths, with the decision tree achieving 98 % sensitivity and 65 % specificity (AUC=0.906), while the logistic regression model demonstrated 100 % sensitivity and 54 % specificity (AUC=0.919). The decision tree approach provided enhanced explainability through interpretable branching logic. Implementation of these models resulted in substantial reductions in unnecessary PBFC utilization, decreasing inappropriate testing by 35–40 %, though limitations included relatively small sample size and absence of external validation [34].

Large-scale automated laboratory test recommendation systems have demonstrated exceptional performance using deep learning architectures trained on comprehensive healthcare databases. Islam et al. developed deep neural network system using retrospective data from the Taiwanese National Health Insurance Research and Development database, encompassing 1,463,837 prescriptions from 530,050 unique patients [35]. Their three-layer deep neural network achieved outstanding performance with area under the receiver operating characteristic curve (AUROC) of 0.98, while individual laboratory tests demonstrated AUROC values ranging from 0.76 to 1. However, the deep learning approach was characterized by limited interpretability, often described as a “black box” system, with additional limitations including absence of external validation [35]. Another study from Islam et al. focused on cardiology department patients [36]. They developed deep learning models trained on 35 laboratory tests that represented more than 90 % of total tests ordered in this clinical setting. Their deep neural network achieved an AUROC of 0.87, with 99 % sensitivity and AUROC ranges from 0.63 to 0.90 across different laboratory tests. While this specialized approach enabled more targeted clinical application, limitations included restriction to a single clinical department and relatively limited test menu coverage, highlighting the challenges of developing generalizable algorithms across diverse clinical contexts [36].

Intelligent reflex testing represents an innovative application of ML to automate complex clinical decision-making traditionally requiring expert knowledge. McDermott et al. developed “smart” reflex testing protocols using ferritin testing as a representative example, analyzing 288,427 unique complete blood count events across 137,451 unique patients [37]. Their random forest model achieved an AUC of 0.731. Feature importance analysis revealed clinically relevant predictors for iron deficiency, with previous ferritin testing within two years serving as a highly informative predictor. Model predictions aligned more closely with ideal clinical practice than actual physician ordering patterns, though limitations included subjectivity of clinical validation and single-center study design [37].

Commercial solutions

Commercial tools have emerged to help laboratories optimize test utilization and improve ordering practices (Table 2). These solutions leverage data analytics, decision support, and automation to reduce unnecessary tests, ensure appropriate ordering, and enhance cost efficiency. While many of these products use sophisticated algorithms to analyze testing patterns and generate insights, the hc1 platform is the most explicit in its use of AI. It employs ML to proactively flag and address test misutilization.

Tube handling (labeling, etc.)

Research applications

AI applications in automated tube handling and labeling represent sophisticated integrations of computer vision, robotics, and ML technologies designed to enhance preanalytical efficiency while reducing human error rates (Table 1).

Demirci evaluated the operational performance of NESLI, an AI-based tube-labeling robot incorporating two convolutional neural networks [38]. The first neural network processed overhead camera images to precisely determine tube positions, while the second network classified tubes based on cap color characteristics. Using simulator-generated data based on 12 months of real-world operational patterns from hospital blood collection units, incorporating 330 synthetic orders containing varying combinations of standard-sized and nonstandard-sized tubes, the system achieved 99.2 % success rate for labeling parameters related to manufacturer maximum blood level mark visibility and 100 % success across seven additional labeling quality parameters, including consistency between selected tubes and orders, quality of label application, consistency in external label printing, consistency between label and barcode data with orders, readability of printed barcodes, and recovery of barcode data by standard scanners. Other operational parameters achieved 100 % success rates for tube handling, critical stock warnings, and physical tube location and gripping functions, with median labeling time of 8.96 s per tube. System limitations included tube storage capacity restricted to 204 tubes and occasional temporary pauses due to hospital network interruptions [38]. Similarly, Şişman et al. evaluated KANKA, an AI-driven robot designed for blood tube sorting and preanalytical quality control, utilizing three specialized convolutional deep learning models complemented by a barcode scanning system [39]. The first model determined exact tube positioning, the second identified tube type and assessed adequate blood volume, and the third model specifically identified filling levels, while the fourth model for barcode scanning functionality. Using an experimental validation approach with 1,000 blood tubes from different patients loaded five times to generate 5,000 total trials, KANKA achieved an accuracy of 99.98 % in quality control and sorting operations. The system demonstrated 100 % success rate in detecting all categories of preanalytical errors, including tubes without barcode labels, incorrect barcodes, wrong tube types, empty tubes, tubes intended for other laboratories, tubes without appropriate colored caps, and tubes with inadequate blood volumes. Processing efficiency reached 311 blood tubes per hour, with the capability to store and retrieve sample images providing additional confirmation of AI results, though limitations included the preliminary experimental nature of the validation, exclusion of hemolysis detection, and absence of pediatric tube testing in the current study protocol [39].

Commercial solutions

The aforementioned AI-powered tube handling systems automated tasks that laboratory technicians perform manually. However, several automated pre-analytical systems have been developed to enhance tube handling and labeling, ensuring that samples are correctly collected, labeled, and processed with minimal errors without requiring AI or ML algorithms (Table 2).

Discussion

Across pre-analytical use cases, a persistent translation gap separates research prototypes from market-ready systems. Academic models excel at pattern recognition on curated laboratory or image data and frequently report strong discrimination, yet many depend on single-center, retrospective datasets and lack clear paths to integration or explainability. Commercial offerings, by contrast, prioritize deterministic prevention, traceability, and uptime – using barcodes, sensors, spectrophotometry, pressure/level checks, and middleware rules that fit cleanly into existing analyzers, automation tracks, and LIS workflows. Where AI is present commercially, it is typically narrow and embedded inside rule-based frameworks (e.g., AI-assisted dilution logic; ML-based stewardship), rather than replacing the broader control stack.

This divide reflects a pragmatic difference in problem framing: research tends to infer latent states from multivariate signals (e.g., mix-up risk from CBC patterns or HIL from images), whereas deployed systems focus on preventing or directly measuring the relevant physical/biochemical conditions (e.g., bedside positive patient identification, probe/pressure monitoring, spectrophotometric indices). The former promises greater flexibility but faces hurdles in generalizability, governance, and integration; the latter delivers predictable performance at scale with clear failure modes and maintenance routines.

The regulatory landscape for AI-enabled medical device software in laboratory medicine remains complex (IVDR, ISO 15189:2022, EU AI Act), with a growing subset of higher-risk AI devices and corresponding demands for clear intended use, clinical evidence, bias mitigation, and post-market surveillance [41]. Successful clinical laboratory deployment of AI necessitates addressing significant validation, regulatory, and implementation challenges through structured frameworks [42]. Amid the complex IVDR–ISO 15189:2022–EU AI Act framework, Table 3 presents a hypothetical AI-enabled HIL detection system, presenting how to navigate the regulatory requirements in practice.

Table 3:

A hypothetical example: AI-driven HIL detection system.

Category	Requirements & details
Intended use & IVDR classification (qualification first)	The software qualifies as medical device software (MDSW) when its intended purpose is to provide information derived from in vitro specimens to support acceptance/rejection decisions or sample triage for downstream testing. Under IVDR annex VIII, software that drives or influences a device inherits the device’s classification; standalone software is classified based on its own risk profile. Rule 3(g) upgrades classification to class C when outputs are used for disease staging, where an erroneous result could lead to life-threatening management decisions. A HIL triage tool focused on sample integrity (rather than disease staging) will typically be justified as class B; however, the final classification depends on the precise intended use claims (e.g., automatic hold/release of results, auto-cancellation), foreseeable misuse, and risk controls, and must be substantiated in the technical documentation (and assessed by a notified body for class B/C).
Conformity assessment & contents of the technical documentation	For class B/C devices, notified body assessment is mandatory. Performance evaluation and PMS/PMPF are required under IVDR for all classes (with depth proportional to risk); for class B/C, these are assessed by the notified body The technical documentation must detail design, manufacturing, and performance (including software verification/validation and the performance evaluation report). Software verification and validation evidence must be provided per IVDR annex II Section 6.4, scoped to the intended environment (supported configurations) and proportionate to risk level.

Performance evidence expected for an AI HIL detector (implementation guidelines)

Scientific validity	Establish the clinical rationale demonstrating that visual/optical or spectrophotometric features correlate with hemolysis, icterus, and lipemia interference thresholds relevant to affected analytes (e.g., K⁺, ALT, bilirubin panels). Summarize supporting literature, analyzer specifications, and any internal bench data that validates this correlation (to be included in the performance evaluation report section of the technical file).
Analytical performance (device/software level)	Accuracy against a reference method (e.g., spectrophotometric serum indices) using pre-specified cut-points for H/I/L; report sensitivity/specificity with confidence intervals for each class, correlation between the tool and reference method across different matrices (serum/plasma), tube types, and storage conditions. Robustness/generalizability across instruments, lighting/camera settings (for image-based systems), centrifugation profiles, and pre-analytical workflows; stress testing and failure modes must be characterized. Repeatability/reproducibility, imprecision assessment, and inter-site agreement studies; predefined acceptance criteria must be established. Human factors evaluation (when operator interaction exists): Error-proofing of user interface and alert systems to minimize inappropriate overrides/false blocks. Measurement uncertainty (MU) impact assessment: Where quantitative inputs inform qualitative decision thresholds, estimate how measurement uncertainty in inputs propagates to the final positive/negative decision (per ISO 15189 Section 7.3.4f). Participate in external quality assessment when interpretation is performed (ISO 15189 Section 7.3.7.3a).
Clinical performance (intended purpose level)	Study design: Prospective, preferably multicenter studies enrolling routine specimens. Primary endpoints: Ability to correctly classify compromised samples and approve acceptable ones relative to the laboratory’s established policy; report overall and per-analyte downstream impact. Clinical utility demonstration: Show measurable reduction in analytical interferences (e.g., decreased hemolysis-related false hyperkalemia flags), reduced recollection rates, and turnaround time effects; prespecify non-inferiority/effectiveness margins. Risk management & cybersecurity: Conduct hazard analysis for false pass/result release scenarios, LIS middleware failures, and data integrity issues; specify mitigation strategies (operator review queues, audit trails). Post-market surveillance: Develop PMS/PMPF plan, including live performance drift monitoring, error trend analysis, and field corrective action criteria. For software systems: Identify all reference databases/data sources used in the performance evaluation (IVDR annex XIII). For high-risk AI systems: Include the AI Act post-market monitoring plan in the AI technical documentation (article 72(3)) and AI Act annex IV technical documentation of training/validation/testing datasets (provenance, curation, representativeness, and data splits).
EU AI Act & IVDR alignment	High-risk status determination: If the AI-enabled HIL tool is an IVDR device subject to third-party conformity assessment (e.g., class B/C/D), it qualifies as a high-risk AI system under AI Act article 6(1). AI Act registration (provider obligations): Register the high-risk AI system in the EU database for high-risk AI systems (article 49). IVDR registrations (manufacturer obligations): Complete EUDAMED registration for devices/UDI (article 26); use the electronic system for registration of economic operators (article 27); obtain the single registration number (SRN) via article 28 registration of the manufacturer/authorized representative/importer. Instructions for users: Provide instructions for use that support correct use and interpretation, including device characteristics, capabilities, limitations, expected accuracy, and conditions affecting accuracy/robustness/cybersecurity (AI Act article 13).
ISO 15189:2022 obligations for the deploying laboratory	Before implementation, the laboratory must: (i) Verify/validate the LIS or middleware integration and any software that impacts examination results; (ii) maintain procedures for nonconforming work and corrective action; (iii) participate in external quality assessment (if available) when software contributes to interpretation; and (iv) assess measurement uncertainty of inputs and its effect on qualitative thresholds used by the software. These represent laboratory-side responsibilities and complement the manufacturer’s CE-marking obligations.
Practical acceptance criteria (example)	Note: The thresholds below are examples only. Actual acceptance criteria must be predefined in the performance evaluation plan and justified by risk assessment and intended use. Testing must employ predefined metrics and probabilistic thresholds appropriate to the intended purpose (AI Act article 9(8)). Primary performance criteria: H/I/L sensitivity ≥95 % for samples exceeding laboratory-defined serum index cut-offs; specificity ≥97 % for acceptable samples, both demonstrated across sites. Safety criteria: ≤0.5 % false pass rate for severe hemolysis class; automatic hold plus human review for borderline classes. Clinical utility criteria: ≥20 % reduction in downstream interference flags; no clinically meaningful turnaround time penalty.

AI, artificial intelligence; AI, Act, European Union Artificial Intelligence Act; ALT, alanine aminotransferase; CE, Conformité Européenne (CE) marking; EUDAMED, European Database on Medical Devices; HIL, hemolysis, icterus, and lipemia; IVDR, In Vitro Diagnostic Medical Devices Regulation (EU) 2017/746; ISO, International Organization for Standardization; K⁺, potassium ion; LIS, Laboratory Information System; MDSW, medical device software; MU, measurement uncertainty; PMS, post-market surveillance; PMPF, post-market performance follow-up; SRN, Single Registration Number; UDI, Unique Device Identification.

From an informatics standpoint, current LIS/middleware often lack the structures to store and exchange the rich context AI systems require, underscoring the need for upgrades that embrace Findable, Accessible, Interoperable, and Reusable (FAIR) data principles and robust connectivity [43]. Nevertheless, cost-effectiveness is achievable but contingent on careful technology selection and alignment with local infrastructure and staffing [44], 45].

Bridging the gap will require four critical components: (i) multicenter, prospective evaluations that demonstrate robustness across populations, instruments, and workflows; (ii) human-factors and change-management plans that minimize alert fatigue and rework; (iii) tight LIS/middleware interoperability and data pipelines capable of handling the metadata/peridata AI needs; and (iv) evaluation frameworks that move beyond conventional performance metrics toward operational and clinical outcomes (turnaround time, recollection rates, cost-per-reportable result, and patient impact). Complementing these system-level requirements, the recent EFLM checklist provides a rigorous, standardized, and transparent framework for developing AI/ML models in clinical laboratories [46].

Looking ahead, Total Laboratory Automation (TLA) is seen as a transformative solution, integrating AI, ML, robotics, and Internet-of-things technologies to enable predictive analytics and automated data management across all phases of laboratory operations [16].

Prior papers have charted where AI and robotics might assist pre-analytics, emphasizing opportunity mapping and technology trends [17], 47]. Building on that landscape, the present review contributes (i) quantitative, study-level performance summaries for representative use cases, (ii) an explicit research-versus-commercial juxtaposition that distinguishes deterministic controls from AI claims in the market, and (iii) a translation-gap framework that identifies requirements for deployment – external validation, human-factors and change management, LIS/middleware interoperability, and outcomes-focused evaluation.

Conclusions

Overall, the translation gap exists in addition to the limitations noted in the research body: single-center or simulated designs, retrospective validation, and variable explainability and integration pathways. Commercial tools prioritize deterministic controls and end-to-end traceability, which provide reliable, scalable performance today. Notably, use cases that complement deterministic controls – rather than compete with them – are most likely to see near-term (e.g. next 5 years) adoption (e.g., AI that schedules or scopes dilutions before analysis, or flags ordering misutilization upstream). As standards, empirical evidence, and infrastructure mature, it is anticipated that AI will be strategically adopted in pre-analytical tasks that are difficult to encode using deterministic rules and that offer measurable improvements in safety, efficiency, and cost.

Corresponding author: Hikmet Can Çubukçu, MD, MSc, PhD, EuSpLM, Rare Diseases Department, General Directorate of Health Services, Turkish Ministry of Health, Bilkent Yerleskesi, 6001, Ankara, Türkiye; Department of Medical Biochemistry, Sincan Training and Research Hospital, Ankara, Türkiye; and Cadde, Universiteler Mahallesi, 06800, Çankaya, Ankara, Türkiye, E-mail: hikmetcancubukcu@gmail.com

Research ethics: Not applicable.
Informed consent: Not applicable.
Author contributions: The author has accepted responsibility for the entire content of this manuscript and approved its submission.
Use of Large Language Models, AI and Machine Learning Tools: None declared.
Conflict of interest: The author states no conflict of interest.
Research funding: None declared.
Data availability: None declared.

References

1. Lima-Oliveira, G, Volanski, W, Lippi, G, Picheth, G, Guidi, GC. Pre-analytical phase management: a review of the procedures from patient preparation to laboratory analysis. Scand J Clin Lab Invest 2017;77:153–63. https://doi.org/10.1080/00365513.2017.1295317.Suche in Google Scholar PubMed

2. Plebani, M. Exploring the iceberg of errors in laboratory medicine. Clin Chim Acta 2009;404:16–23. https://doi.org/10.1016/j.cca.2009.03.022.Suche in Google Scholar PubMed

3. Nordin, N, Ab Rahim, SN, Wan Omar, WFA, Zulkarnain, S, Sinha, S, Kumar, S, et al.. Preanalytical errors in clinical laboratory testing at a glance: source and control measures. Cureus 2024;16:e57243. https://doi.org/10.7759/cureus.57243.Suche in Google Scholar PubMed PubMed Central

4. Lippi, G, Mattiuzzi, C, Bovo, C, Favaloro, EJ. Managing the patient identification crisis in healthcare and laboratory medicine. Clin Biochem 2017;50:562–7. https://doi.org/10.1016/j.clinbiochem.2017.02.004.Suche in Google Scholar PubMed

5. Giavarina, D, Lippi, G. Blood venous sample collection: recommendations overview and a checklist to improve quality. Clin Biochem 2017;50:568–73. https://doi.org/10.1016/j.clinbiochem.2017.02.021.Suche in Google Scholar PubMed

6. Cornes, M, van Dongen-Lases, E, Grankvist, K, Ibarz, M, Kristensen, G, Lippi, G, et al.. Order of blood draw: opinion paper by the European federation for clinical chemistry and laboratory medicine (EFLM) working group for the preanalytical phase (WG-PRE). Clin Chem Lab Med 2017;55:27–31. https://doi.org/10.1515/cclm-2016-0426.Suche in Google Scholar PubMed

7. Ziobrowska-Bech, A, Hansen, AB, Christensen, PA. Analyte stability in whole blood using experimental and datamining approaches. Scand J Clin Lab Invest 2022;82:115–22. https://doi.org/10.1080/00365513.2022.2031280.Suche in Google Scholar PubMed

8. Dromigny, JA, Robert, E. Stability of blood potassium: effects of duration, temperature and transport during 10 hours storage of human whole blood in serum and plasma. Ann Biol Clin 2017;75:369–74. https://doi.org/10.1684/abc.2017.1255.Suche in Google Scholar PubMed

9. Simundic, AM, Nikolac, N, Ivankovic, V, Ferenec-Ruzic, D, Magdic, B, Kvaternik, M, et al.. Comparison of visual vs. automated detection of lipemic, icteric and hemolyzed specimens: can we rely on a human eye? Clin Chem Lab Med 2009;47:1361–5. https://doi.org/10.1515/cclm.2009.306.Suche in Google Scholar

10. Bruneel, A, Dehoux, M, Barnier, A, Boutten, A. External evaluation of the dimension Vista 1500® intelligent lab system. J Clin Lab Anal 2012;26:384–97. https://doi.org/10.1002/jcla.21539.Suche in Google Scholar PubMed PubMed Central

11. Simundic, AM, Lippi, G. Preanalytical phase--a continuous challenge for laboratory professionals. Biochem Med (Zagreb) 2012;22:145–9. https://doi.org/10.11613/bm.2012.017.Suche in Google Scholar PubMed PubMed Central

12. Plebani, M, Carraro, P. Mistakes in a stat laboratory: types and frequency. Clin Chem 1997;43:1348–51. https://doi.org/10.1093/clinchem/43.8.1348.Suche in Google Scholar

13. Lin, Y, Spies, NC, Zohner, K, McCoy, D, Zaydman, MA, Farnsworth, CW. Pre-analytical phase errors constitute the vast majority of errors in clinical laboratory testing. Clin Chem Lab Med 2025;63:1709–15. https://doi.org/10.1515/cclm-2025-0190.Suche in Google Scholar PubMed

14. Bellini, C, Guerranti, R, Cinci, F, Milletti, E, Scapellato, C. Defining and managing the preanalytical phase with FMECA: automation and/or “human” control. Hum Factors 2020;62:20–36. https://doi.org/10.1177/0018720819874906.Suche in Google Scholar PubMed

15. Lippi, G, Blanckaert, N, Bonini, P, Green, S, Kitchen, S, Palicka, V, et al.. Causes, consequences, detection, and prevention of identification errors in laboratory diagnostics. Clin Chem Lab Med 2009;47:143–53. https://doi.org/10.1515/cclm.2009.045.Suche in Google Scholar PubMed

16. Nam, Y, Park, HD. Revolutionizing laboratory practices: pioneering trends in total laboratory automation. Ann Lab Med 2025;45:472–83. https://doi.org/10.3343/alm.2024.0581.Suche in Google Scholar PubMed PubMed Central

17. Lippi, G, Mattiuzzi, C, Favaloro, EJ. Artificial intelligence in the pre-analytical phase: state-of-the art and future perspectives. J Med Biochem 2024;43:1–10. https://doi.org/10.5937/jomb0-45936.Suche in Google Scholar PubMed PubMed Central

18. Çubukçu, HC, Topcu, D, Yenice, S. Machine learning-based clinical decision support using laboratory data. Clin Chem Lab Med 2024;62:793–823. https://doi.org/10.1515/cclm-2023-1037.Suche in Google Scholar PubMed

19. Fang, K, Dong, Z, Chen, X, Zhu, J, Zhang, B, You, J, et al.. Using machine learning to identify clotted specimens in coagulation testing. Clin Chem Lab Med 2021;59:1289–97. https://doi.org/10.1515/cclm-2021-0081.Suche in Google Scholar PubMed

20. Hou, J, Ren, W, Zhao, W, Li, H, Liu, M, Wang, H, et al.. Blood clot and fibrin recognition method for serum images based on deep learning. Clin Chim Acta 2024;553:117732. https://doi.org/10.1016/j.cca.2023.117732.Suche in Google Scholar PubMed

21. Farrell, CJ, Makuni, C, Keenan, A, Maeder, E, Davies, G, Giannoutsos, J. A machine learning model for the routine detection of “wrong blood in complete blood count tube” errors. Clin Chem 2023;69:1031–7. https://doi.org/10.1093/clinchem/hvad100.Suche in Google Scholar PubMed

22. Graham, BV, Master, SR, Obstfeld, AE, Wilson, RB. A multianalyte machine learning model to detect wrong blood in complete blood count tube errors in a pediatric setting. Clin Chem 2025;71:418–27. https://doi.org/10.1093/clinchem/hvae210.Suche in Google Scholar PubMed

23. Farrell, CL, Giannoutsos, J. Machine learning models outperform manual result review for the identification of wrong blood in tube errors in complete blood count results. Int J Lab Hematol 2022;44:497–503. https://doi.org/10.1111/ijlh.13820.Suche in Google Scholar PubMed

24. Farrell, CJ. Identifying mislabelled samples: machine learning models exceed human performance. Ann Clin Biochem 2021;58:650–2. https://doi.org/10.1177/00045632211032991.Suche in Google Scholar PubMed

25. Farrell, CL. Decision support or autonomous artificial intelligence? The case of wrong blood in tube errors. Clin Chem Lab Med 2022;60:1993–7. https://doi.org/10.1515/cclm-2021-0873.Suche in Google Scholar PubMed

26. Mitani, T, Doi, S, Yokota, S, Imai, T, Ohe, K. Highly accurate and explainable detection of specimen mix-up using a machine learning model. Clin Chem Lab Med 2020;58:375–83. https://doi.org/10.1515/cclm-2019-0534.Suche in Google Scholar PubMed

27. Rosenbaum, MW, Baron, JM. Using machine learning-based multianalyte delta checks to detect wrong blood in tube errors. Am J Clin Pathol 2018;150:555–66. https://doi.org/10.1093/ajcp/aqy085.Suche in Google Scholar PubMed

28. Zhou, R, Liang, YF, Cheng, HL, Wang, W, Huang, DW, Wang, Z, et al.. A highly accurate Delta check method using deep learning for detection of sample mix-up in the clinical laboratory. Clin Chem Lab Med 2021;60:1984–92. https://doi.org/10.1515/cclm-2021-1171.Suche in Google Scholar PubMed

29. Ialongo, C, Pieri, M, Bernardini, S. Smart management of sample dilution using an artificial neural network to achieve streamlined processes and saving resources: the automated nephelometric testing of serum free light chain as case study. Clin Chem Lab Med 2017;55:231–6. https://doi.org/10.1515/cclm-2016-0263.Suche in Google Scholar PubMed

30. Streun, GL, Steuer, AE, Ebert, LC, Dobay, A, Kraemer, T. Interpretable machine learning model to detect chemically adulterated urine samples analyzed by high resolution mass spectrometry. Clin Chem Lab Med 2021;59:1392–9. https://doi.org/10.1515/cclm-2021-0010.Suche in Google Scholar PubMed

31. Yang, C, Li, D, Sun, D, Zhang, S, Zhang, P, Xiong, Y, et al.. A deep learning-based system for assessment of serum quality using sample images. Clin Chim Acta 2022;531:254–60. https://doi.org/10.1016/j.cca.2022.04.010.Suche in Google Scholar PubMed

32. Man, D, Yang, X, Du, W, Ye, H, Shi, Y, Guan, Y, et al.. Research on the development of image-based deep learning (DL) model for serum quality recognition. Clin Chem Lab Med 2025;63:e179–83. https://doi.org/10.1515/cclm-2024-1219.Suche in Google Scholar PubMed

33. Benirschke, RC, Gniadek, TJ. Detection of falsely elevated point-of-care potassium results due to hemolysis using predictive analytics. Am J Clin Pathol 2020;154:242–7. https://doi.org/10.1093/ajcp/aqaa039.Suche in Google Scholar PubMed

34. Zhang, ML, Guo, AX, Kadauke, S, Dighe, AS, Baron, JM, Sohani, AR. Machine learning models improve the diagnostic yield of peripheral blood flow cytometry. Am J Clin Pathol 2020;153:235–42. https://doi.org/10.1093/ajcp/aqz150.Suche in Google Scholar PubMed

35. Islam, MM, Poly, TN, Yang, HC, Li, YJ. Deep into laboratory: an artificial intelligence approach to recommend laboratory tests. Diagnostics 2021;11. https://doi.org/10.3390/diagnostics11060990.Suche in Google Scholar PubMed PubMed Central

36. Islam, MM, Yang, HC, Poly, TN, Li, YJ. Development of an artificial intelligence-based automated recommendation system for clinical laboratory tests: retrospective analysis of the national health insurance database. JMIR Med Inform 2020;8:e24163. https://doi.org/10.2196/24163.Suche in Google Scholar PubMed PubMed Central

37. McDermott, M, Dighe, A, Szolovits, P, Luo, Y, Baron, J. Using machine learning to develop smart reflex testing protocols. J Am Med Inf Assoc 2024;31:416–25. https://doi.org/10.1093/jamia/ocad187.Suche in Google Scholar PubMed PubMed Central

38. Demirci, F. Measuring the operational performance of an artificial intelligence-based blood tube-labeling robot, NESLI. Am J Clin Pathol 2025;163:178–86. https://doi.org/10.1093/ajcp/aqae108.Suche in Google Scholar PubMed

39. Şişman, AR, Başok, B, Karakoyun, İ, Çolak, A, Bilge, U, Demirci, F, et al.. Measuring the performance of an artificial intelligence-based robot that classifies blood tubes and performs quality control in terms of preanalytical errors: a preliminary study. Am J Clin Pathol 2024;161:553–60. https://doi.org/10.1093/ajcp/aqad179.Suche in Google Scholar PubMed

40. Chen, S, Li, Q, Liao, H, Zhao, M, Chen, J, Chen, H, et al.. Establishment and application of a preset dilution factor strategy for human chorionic gonadotropin testing in clinical laboratory. Front Mol Biosci 2025;12, https://doi.org/10.3389/fmolb.2025.1648421.Suche in Google Scholar PubMed PubMed Central

41. Çubukçu, HC, Boursier, G, Linko, S, Bernabeu-Andreu, FA, Meško Brguljan, P, Tosheska-Trajkovska, K, et al.. Regulating the future of laboratory medicine: european regulatory landscape of AI-driven medical device software in laboratory medicine. Clin Chem Lab Med 2025;63:1891–914. https://doi.org/10.1515/cclm-2025-0482.Suche in Google Scholar PubMed

42. Garcia, CA, Reed, KA, Lantz, E, Day, P, Zarella, MD, Hart, SN, et al.. Establishing a comprehensive artificial intelligence lifecycle framework for laboratory medicine and pathology: a series introduction. Am J Clin Pathol 2025;164:424–37. https://doi.org/10.1093/ajcp/aqaf069.Suche in Google Scholar PubMed

43. Padoan, A, Cadamuro, J, Frans, G, Cabitza, F, Tolios, A, De Bruyne, S, et al.. Data flow in clinical laboratories: could metadata and peridata bridge the gap to new AI-based applications? Clin Chem Lab Med 2025;63:684–91. https://doi.org/10.1515/cclm-2024-0971.Suche in Google Scholar PubMed

44. Hanna, MG, Reuter, VE, Samboy, J, England, C, Corsale, L, Fine, SW, et al.. Implementation of digital pathology offers clinical and operational increase in efficiency and cost savings. Arch Pathol Lab Med 2019;143:1545–55. https://doi.org/10.5858/arpa.2018-0514-oa.Suche in Google Scholar PubMed PubMed Central

45. Naugler, C, Church, DL. Automation and artificial intelligence in the clinical laboratory. Crit Rev Clin Lab Sci 2019;56:98–110. https://doi.org/10.1080/10408363.2018.1561640.Suche in Google Scholar PubMed

46. Carobene, A, Cadamuro, J, Frans, G, Goldshmidt, H, Debeljak, Z, De Bruyne, S, et al.. EFLM checklist for the assessment of AI/ML studies in laboratory medicine: enhancing general medical AI frameworks for laboratory-specific applications. Clin Chem Lab Med 2025. https://doi.org/10.1515/cclm-2025-0841 [Epub ahead of print].Suche in Google Scholar PubMed

47. Plebani, M, Scott, S, Simundic, AM, Cornes, M, Padoan, A, Cadamuro, J, et al.. New insights in preanalytical quality. Clin Chem Lab Med 2025;63:1682–92. https://doi.org/10.1515/cclm-2025-0478.Suche in Google Scholar PubMed

Received: 2025-09-17

Accepted: 2025-10-04

Published Online: 2025-10-16

https://doi.org/10.1515/cclm-2025-1220

Schlagwörter für diesen Artikel

pre-analytical phase; artificial intelligence; machine learning; laboratory medicine; pre-analytical errors; products