Expert-level detection of M-proteins in serum protein electrophoresis using machine learning
-
Eike Elfert
, Wolfgang E. Kaminski , Christian Matek , Gregor Hoermann, Eyvind W. Axelsen
, Carsten Marr und Armin P. Piehler
Abstract
Objectives
Serum protein electrophoresis (SPE) in combination with immunotyping (IMT) is the diagnostic standard for detecting monoclonal proteins (M-proteins). However, interpretation of SPE and IMT is weakly standardized, time consuming and investigator dependent. Here, we present five machine learning (ML) approaches for automated detection of M-proteins on SPE on an unprecedented large and well-curated data set and compare the performance with that of laboratory experts.
Methods
SPE and IMT were performed in serum samples from 69,722 individuals from Norway. IMT results were used to label the samples as M-protein present (positive, n=4,273) or absent (negative n=65,449). Four feature-based ML algorithms and one convolutional neural network (CNN) were trained on 68,722 randomly selected SPE patterns to detect M-proteins. Algorithm performance was compared to that of an expert group of clinical pathologists and laboratory technicians (n=10) on a test set of 1,000 samples.
Results
The random forest classifier showed the best performance (F1-Score 93.2 %, accuracy 99.1 %, sensitivity 89.9 %, specificity 99.8 %, positive predictive value 96.9 %, negative predictive value 99.3 %) and outperformed the experts (F1-Score 61.2 ± 16.0 %, accuracy 89.2 ± 10.2 %, sensitivity 94.3 ± 2.8 %, specificity 88.9 ± 10.9 %, positive predictive value 47.3 ± 16.2 %, negative predictive value 99.5 ± 0.2 %) on the test set. Interestingly the performance of the RFC saturated, the CNN performance increased steadily within our training set (n=68,722).
Conclusions
Feature-based ML systems are capable of automated detection of M-proteins on SPE beyond expert-level and show potential for use in the clinical laboratory.
Funding source: H2020 European Research Council
Award Identifier / Grant number: 866411
Award Identifier / Grant number: 101113551
Acknowledgments
We would like to thank the laboratory technicians at Fürst Medical Laboratory for their participation in the study and their interpretation of protein electrophoresis curves in order to compare performance of the machine learning model with human experts.
-
Research ethics: The study was approved by the Ethics Committee of the University of Heidelberg (2018-548N-MA), Germany, and the Regional Ethics Committee REC South-East, Norway (231395).
-
Informed consent: The data used was completely anonymized. Tracing back to single individuals was not possible.
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission. Conceptualization: Wolfgang E. Kaminski (lead), Armin P. Piehler, Carsten Marr, Christian Matek and Eike Elfert. Data curation: Eyvind W. Axelsen, Eike Elfert, Wolfgang E. Kaminski. Formal Analysis: Eike Elfert (lead), Armin P. Piehler, Carsten Marr, Christian Matek. Funding acquisition: Carsten Marr. Investigation: Eike Elfert (lead), Armin P. Piehler, Carsten Marr, Christian Matek. Methodology: Armin P. Piehler, Carsten Marr, Christian Matek, Wolfgang E. Kaminski, Eike Elfert. Project administration: Armin P. Piehler, Carsten Marr, Christian Matek, Wolfgang E. Kaminski, Eike Elfert. Resources: Eyvind W. Axelsen. Software: Eike Elfert, Eyvind W. Axelsen. Supervision: Wolfgang E. Kaminski, Armin P. Piehler, Carsten Marr, Christian Matek. Validation: Eike Elfert, Armin P. Piehler, Carsten Marr, Christian Matek. Visualization: Eike Elfert. Writing – original draft: Eike Elfert, Wolfgang E. Kaminski, Christian Matek, Gregor Hoermann, Eyvind W. Axelsen, Carsten Marr, Armin P. Piehler. Writing – review & editing: Eike Elfert, Christian Matek, Carsten Marr, Armin P. Piehler.
-
Competing interests: The authors state no conflict of interest.
-
Research funding: Funding from the European Research Council (ERC), grant agreement no. 866411 and 101113551.
-
Data availability: The raw data can be obtained on request from the corresponding author.
References
1. Kyle, RA, Larson, DR, Therneau, TM, Dispenzieri, A, Kumar, S, Cerhan, JR, et al.. Long-term follow-up of monoclonal gammopathy of undetermined significance. N Engl J Med 2018;378:241–9. https://doi.org/10.1056/nejmoa1709974.Suche in Google Scholar PubMed PubMed Central
2. Bray, F, Ferlay, J, Soerjomataram, I, Siegel, RL, Torre, LA, Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J Clin 2018;68:394–424. https://doi.org/10.3322/caac.21492.Suche in Google Scholar PubMed
3. Rajkumar, SV, Dimopoulos, MA, Palumbo, A, Blade, J, Merlini, G, Mateos, MV, et al.. International Myeloma Working Group updated criteria for the diagnosis of multiple myeloma. Lancet Oncol 2014;15:538–48. https://doi.org/10.1016/s1470-2045(14)70442-5.Suche in Google Scholar
4. Tate, JR. The paraprotein – an enduring biomarker. Clin Biochem Rev 2019;40:5–22.Suche in Google Scholar
5. Harris, NS, Winter, WE. Multiple myeloma and related serum protein disorders: an electrophoretic guide. New York: Demos Medical Publishing; 2012.Suche in Google Scholar
6. Thoren, KL, McCash, SI, Murata, K. Immunotyping provides equivalent results to immunofixation in a population with a high prevalence of monoclonal gammopathies. J Appl Lab Med 2021;6:1551–60. https://doi.org/10.1093/jalm/jfab067.Suche in Google Scholar PubMed PubMed Central
7. Cárdenas, MC, García-Sanz, R, Puig, N, Pérez-Surribas, D, Flores-Montero, J, Ortiz-Espejo, M, et al.. Recommendations for the study of monoclonal gammopathies in the clinical laboratory. A consensus of the Spanish Society of Laboratory Medicine and the Spanish Society of Hematology and Hemotherapy. Part I: update on laboratory tests for the study of monoclonal gammopathies. Clin Chem Lab Med 2023;61:2115–30. https://doi.org/10.1515/cclm-2023-0326.Suche in Google Scholar PubMed
8. Cárdenas, MC, García-Sanz, R, Puig, N, Pérez-Surribas, D, Flores-Montero, J, Ortiz-Espejo, M, et al.. Recommendations for the study of monoclonal gammopathies in the clinical laboratory. A consensus of the Spanish Society of Laboratory Medicine and the Spanish Society of Hematology and Hemotherapy. Part II: methodological and clinical recommendations for the diagnosis and follow-up of monoclonal gammopathies. Clin Chem Lab Med 2023;61:2131–42. https://doi.org/10.1515/cclm-2023-0325.Suche in Google Scholar
9. O’Connell, TX, Horita, TJ, Kasravi, B. Understanding and interpreting serum protein electrophoresis. Am Fam Physician 2005;71:105–12.Suche in Google Scholar
10. Font, P, Loscertales, J, Soto, C, Ricard, P, Novas, CM, Martín-Clavero, E, et al.. Interobserver variance in myelodysplastic syndromes with less than 5 % bone marrow blasts: unilineage vs. multilineage dysplasia and reproducibility of the threshold of 2 % blasts. Ann Hematol 2015;94:565–73. https://doi.org/10.1007/s00277-014-2252-4.Suche in Google Scholar PubMed
11. Fuentes-Arderiu, X, Dot-Bach, D. Measurement uncertainty in manual differential leukocyte counting. Clin Chem Lab Med 2009;47:112–5. https://doi.org/10.1515/cclm.2009.014.Suche in Google Scholar
12. Matek, C, Schwarz, S, Spiekermann, K, Marr, C. Human-level recognition of blast cells in acute myeloid leukaemia with convolutional neural networks. Nat Mach Intell 2019;1:538–44. https://doi.org/10.1038/s42256-019-0101-9.Suche in Google Scholar
13. Esteva, A, Kuprel, B, Novoa, RA, Ko, J, Swetter, SM, Blau, HM, et al.. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017;542:115–8. https://doi.org/10.1038/nature21056.Suche in Google Scholar PubMed PubMed Central
14. Shalev-Shwartz, S, Ben-David, S. Understanding machine learning: from theory to algorithms. New York: Cambridge University Press; 2014.10.1017/CBO9781107298019Suche in Google Scholar
15. Bizopoulos, P, Koutsouris, D. Deep learning in cardiology. IEEE Rev Biomed Eng 2019;12:168–93. https://doi.org/10.1109/rbme.2018.2885714.Suche in Google Scholar PubMed
16. McBee, MP, Awan, OA, Colucci, AT, Ghobadi, CW, Kadom, N, Kansagra, AP, et al.. Deep learning in radiology. Acad Radiol 2018;25:1472–80. https://doi.org/10.1016/j.acra.2018.02.018.Suche in Google Scholar PubMed
17. Yang, H-C, Islam, MM, Jack Li, Y-C. Potentiality of deep learning application in healthcare. Comput Methods Progr Biomed 2018;161:A1. https://doi.org/10.1016/j.cmpb.2018.05.014.Suche in Google Scholar PubMed
18. Altinier, S, Sarti, L, Varagnolo, M, Zaninotto, M, Maggini, M, Plebani, M. An expert system for the classification of serum protein electrophoresis patterns. Clin Chem Lab Med 2008;46:1458–63. https://doi.org/10.1515/cclm.2008.284.Suche in Google Scholar
19. Kratzer, MA, Ivandic, B, Fateh-Moghadam, A. Neuronal network analysis of serum electrophoresis. J Clin Pathol 1992;45:612–5. https://doi.org/10.1136/jcp.45.7.612.Suche in Google Scholar PubMed PubMed Central
20. Ognibene, A, Graziani, MS, Caldini, A, Terreni, A, Righetti, G, Varagnolo, MC, et al.. Computer-assisted detection of monoclonal components: results from the multicenter study for the evaluation of CASPER (Computer Assisted Serum Protein Electrophoresis Recognizer) algorithm. Clin Chem Lab Med 2008;46:1183–8. https://doi.org/10.1515/cclm.2008.221.Suche in Google Scholar PubMed
21. Chen, R, Jaye, DL, Roback, JD, Sherman, MA, Smith, GH. Automated serum protein electrophoresis interpretation using machine learning-based algorithm for paraprotein detection. Am J Clin Pathol 2020;154:S7–8. https://doi.org/10.1093/ajcp/aqaa137.013.Suche in Google Scholar
22. Chabrun, F, Dieu, X, Ferre, M, Gaillard, O, Mery, A, Chao de la Barca, JM, et al.. Achieving expert-level interpretation of serum protein electrophoresis through deep learning driven by human reasoning. Clin Chem 2021;67:1406–14. https://doi.org/10.1093/clinchem/hvab133.Suche in Google Scholar PubMed
23. Pedregosa, F, Varoquaux, G, Gramfort, A, Michel, V, Thirion, B, Grisel, O, et al.. Scikit-learn: machine learning in Python. J Mach Learn Res 2011;12:2825–30.Suche in Google Scholar
24. Cournapeau, D. scikit-learn: machine learning in Python — scikit-learn 1.3.1 documentation; 2007. Available from: https://scikit-learn.org/stable [Accessed 25 Jan 2024].Suche in Google Scholar
25. Ciregan, D, Meier, U, Schmidhuber, J. Multi-column deep neural networks for image classification. In: 2012 IEEE conference on computer vision and pattern recognition. Providence: Institute of Electrical and Electronics Engineers (IEEE); 2012:3642–9 pp.10.1109/CVPR.2012.6248110Suche in Google Scholar
26. Russakovsky, O, Deng, J, Su, H, Krause, J, Satheesh, S, Ma, S, et al.. ImageNet large scale visual recognition challenge. Int J Comput Vis 2015;115:211–52. https://doi.org/10.1007/s11263-015-0816-y.Suche in Google Scholar
27. Breiman, L. Random forests. Mach Learn 2001;45:5–32.10.1023/A:1010933404324Suche in Google Scholar
28. Geurts, P, Ernst, D, Wehenkel, L. Extremely randomized trees. Mach Learn 2006;63:3–42. https://doi.org/10.1007/s10994-006-6226-1.Suche in Google Scholar
29. Freund, Y. Boosting a weak learning algorithm by majority. Inf Comput 1995;121:256–85. https://doi.org/10.1006/inco.1995.1136.Suche in Google Scholar
30. Friedman, JH. Stochastic gradient boosting. Comput Stat Data Anal 2002;38:367–78. https://doi.org/10.1016/s0167-9473(01)00065-2.Suche in Google Scholar
31. LeCun, Y, Bengio, Y, Hinton, G. Deep learning. Nature 2015;521:436–44. https://doi.org/10.1038/nature14539.Suche in Google Scholar PubMed
32. Liu, J, Osadchy, M, Ashton, L, Foster, M, Solomon, CJ, Gibson, SJ. Deep convolutional neural networks for Raman spectrum recognition: a unified solution. Analyst 2017;142:4067–74. https://doi.org/10.1039/c7an01371j.Suche in Google Scholar PubMed
33. Chollet, F. Deep learning mit Python und Keras: Das Praxis-Handbuch vom Entwickler der Keras-Bibliothek. Frechen: MITP-Verlags GmbH & Co. KG; 2018.Suche in Google Scholar
34. Chollet, F. Keras documentation; 2015. Available from: https://keras.io [Accessed 25 Jan 2024].Suche in Google Scholar
35. Hanley, JA, McNeil, BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29–36. https://doi.org/10.1148/radiology.143.1.7063747.Suche in Google Scholar PubMed
36. Guyon, I, Gunn, S, Nikravesh, M, Zadeh, LA. Feature extraction: foundations and applications. Berlin: Springer; 2008.Suche in Google Scholar
37. Atkinson, KE. An introduction to numerical analysis. New Jersey: John Wiley & Sons; 2008.Suche in Google Scholar
38. Graf, O. Arbeitsphysiologie. Berlin: Springer; 2013.Suche in Google Scholar
39. Grinsztajn, L, Oyallon, E, Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? Adv Neural Inf Process Syst 2022;35:507–20.Suche in Google Scholar
40. Hu, H, Xu, W, Jiang, T, Cheng, Y, Tao, X, Liu, W, et al.. Expert-level immunofixation electrophoresis image recognition based on explainable and generalizable deep learning. Clin Chem 2023;69:130–9. https://doi.org/10.1093/clinchem/hvac190.Suche in Google Scholar PubMed
41. Watson, DS, Krutzinna, J, Bruce, IN, Griffiths, CE, McInnes, IB, Barnes, MR, et al.. Clinical applications of machine learning algorithms: beyond the black box. BMJ 2019;364:l886. https://doi.org/10.1136/bmj.l886.Suche in Google Scholar PubMed
42. Poon, AIF, Sung, JJY. Opening the black box of AI-Medicine. J Gastroenterol Hepatol 2021;36:581–4. https://doi.org/10.1111/jgh.15384.Suche in Google Scholar PubMed
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/cclm-2024-0222).
© 2024 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Editorial
- External quality assurance (EQA): navigating between quality and sustainability
- Reviews
- Molecular allergology: a clinical laboratory tool for precision diagnosis, stratification and follow-up of allergic patients
- Nitrous oxide abuse direct measurement for diagnosis and follow-up: update on kinetics and impact on metabolic pathways
- Opinion Papers
- A vision to the future: value-based laboratory medicine
- Point-of-care testing, near-patient testing and patient self-testing: warning points
- Navigating the path of reproducibility in microRNA-based biomarker research with ring trials
- Point/Counterpoint
- Six Sigma – is it time to re-evaluate its value in laboratory medicine?
- The value of Sigma-metrics in laboratory medicine
- Genetics and Molecular Diagnostics
- Analytical validation of the amplification refractory mutation system polymerase chain reaction-capillary electrophoresis assay to diagnose spinal muscular atrophy
- Can we identify patients carrying targeted deleterious DPYD variants with plasma uracil and dihydrouracil? A GPCO-RNPGx retrospective analysis
- General Clinical Chemistry and Laboratory Medicine
- Comparison of ChatGPT, Gemini, and Le Chat with physician interpretations of medical laboratory questions from an online health forum
- External quality assessment performance in ten countries: an IFCC global laboratory quality project
- Multivariate anomaly detection models enhance identification of errors in routine clinical chemistry testing
- Enhanced patient-based real-time quality control using the graph-based anomaly detection
- Performance evaluation and user experience of BT-50 transportation unit with automated and scheduled quality control measurements
- Stability of steroid hormones in dried blood spots (DBS)
- Quantification of C1 inhibitor activity using a chromogenic automated assay: analytical and clinical performances
- Reference Values and Biological Variations
- Time-dependent characteristics of analytical measurands
- Cancer Diagnostics
- Expert-level detection of M-proteins in serum protein electrophoresis using machine learning
- An automated workflow based on data independent acquisition for practical and high-throughput personalized assay development and minimal residual disease monitoring in multiple myeloma patients
- Cardiovascular Diseases
- Analytical validation of the Mindray CL1200i analyzer high sensitivity cardiac troponin I assay: MERITnI study
- Diabetes
- Limitations of glycated albumin standardization when applied to the assessment of diabetes patients
- Patient result monitoring of HbA1c shows small seasonal variations and steady decrease over more than 10 years
- Letters to the Editor
- Inaccurate definition of Bence Jones proteinuria in the EFLM Urinalysis Guideline 2023
- Use of the term “Bence-Jones proteinuria” in the EFLM European Urinalysis Guideline 2023
- Is uracil enough for effective pre-emptive DPD testing?
- Reply to: “Is uracil enough for effective pre-emptive DPD testing?”
- Accurate predictory role of monocyte distribution width on short-term outcome in sepsis patients
- Reply to: “Accurate predictory role of monocyte distribution width on short-term outcome in sepsis patients”
- Spurious parathyroid hormone (PTH) elevation caused by macro-PTH
- Setting analytical performance specifications for copeptin-based testing
- Serum vitamin B12 levels during chemotherapy against diffuse large B-cell lymphoma: a case report and review of the literature
- Evolution of acquired haemoglobin H disease monitored by capillary electrophoresis: a case of a myelofibrotic patient with a novel ATRX mutation
Artikel in diesem Heft
- Frontmatter
- Editorial
- External quality assurance (EQA): navigating between quality and sustainability
- Reviews
- Molecular allergology: a clinical laboratory tool for precision diagnosis, stratification and follow-up of allergic patients
- Nitrous oxide abuse direct measurement for diagnosis and follow-up: update on kinetics and impact on metabolic pathways
- Opinion Papers
- A vision to the future: value-based laboratory medicine
- Point-of-care testing, near-patient testing and patient self-testing: warning points
- Navigating the path of reproducibility in microRNA-based biomarker research with ring trials
- Point/Counterpoint
- Six Sigma – is it time to re-evaluate its value in laboratory medicine?
- The value of Sigma-metrics in laboratory medicine
- Genetics and Molecular Diagnostics
- Analytical validation of the amplification refractory mutation system polymerase chain reaction-capillary electrophoresis assay to diagnose spinal muscular atrophy
- Can we identify patients carrying targeted deleterious DPYD variants with plasma uracil and dihydrouracil? A GPCO-RNPGx retrospective analysis
- General Clinical Chemistry and Laboratory Medicine
- Comparison of ChatGPT, Gemini, and Le Chat with physician interpretations of medical laboratory questions from an online health forum
- External quality assessment performance in ten countries: an IFCC global laboratory quality project
- Multivariate anomaly detection models enhance identification of errors in routine clinical chemistry testing
- Enhanced patient-based real-time quality control using the graph-based anomaly detection
- Performance evaluation and user experience of BT-50 transportation unit with automated and scheduled quality control measurements
- Stability of steroid hormones in dried blood spots (DBS)
- Quantification of C1 inhibitor activity using a chromogenic automated assay: analytical and clinical performances
- Reference Values and Biological Variations
- Time-dependent characteristics of analytical measurands
- Cancer Diagnostics
- Expert-level detection of M-proteins in serum protein electrophoresis using machine learning
- An automated workflow based on data independent acquisition for practical and high-throughput personalized assay development and minimal residual disease monitoring in multiple myeloma patients
- Cardiovascular Diseases
- Analytical validation of the Mindray CL1200i analyzer high sensitivity cardiac troponin I assay: MERITnI study
- Diabetes
- Limitations of glycated albumin standardization when applied to the assessment of diabetes patients
- Patient result monitoring of HbA1c shows small seasonal variations and steady decrease over more than 10 years
- Letters to the Editor
- Inaccurate definition of Bence Jones proteinuria in the EFLM Urinalysis Guideline 2023
- Use of the term “Bence-Jones proteinuria” in the EFLM European Urinalysis Guideline 2023
- Is uracil enough for effective pre-emptive DPD testing?
- Reply to: “Is uracil enough for effective pre-emptive DPD testing?”
- Accurate predictory role of monocyte distribution width on short-term outcome in sepsis patients
- Reply to: “Accurate predictory role of monocyte distribution width on short-term outcome in sepsis patients”
- Spurious parathyroid hormone (PTH) elevation caused by macro-PTH
- Setting analytical performance specifications for copeptin-based testing
- Serum vitamin B12 levels during chemotherapy against diffuse large B-cell lymphoma: a case report and review of the literature
- Evolution of acquired haemoglobin H disease monitored by capillary electrophoresis: a case of a myelofibrotic patient with a novel ATRX mutation