Abstract
Objectives
Clinicians can rapidly and accurately diagnose disease, learn from experience, and explain their reasoning. Computational Bayesian medical decision-making might replicate this expertise. This paper assesses a computer system for diagnosing cardiac chest pain in the emergency department (ED) that decides whether to admit or discharge a patient.
Methods
The system can learn likelihood functions by counting data frequency. The computer compares patient and disease data profiles using likelihood. It calculates a Bayesian probabilistic diagnosis and explains its reasoning. A utility function applies the probabilistic diagnosis to produce a numerical BAYES score for making a medical decision.
Results
We conducted a pilot study to assess BAYES efficacy in ED chest pain patient disposition. Binary BAYES decisions eliminated patient observation. We compared BAYES to the HEART score. On 100 patients, BAYES reduced HEART’s false positive rate 18-fold from 58.7 to 3.3 %, and improved ROC AUC accuracy from 0.928 to 1.0.
Conclusions
The pilot study results were encouraging. The data-driven BAYES score approach could learn from frequency counting, make fast and accurate decisions, and explain its reasoning. The computer replicated these aspects of diagnostic expertise. More research is needed to reproduce and extend these finding to larger diverse patient populations.
The ideal physician should be able to rapidly and accurately diagnose disease [1], learn from clinical experience [2], and explain their diagnostic reasoning [3]. We have developed a computational Bayesian medical decision-making approach with these features [4].
Our architecture can automatically learn from electronic health record (EHR) databases. For a given diagnosis, one can count how often a clinical data feature occurs. These frequency counts develop the likelihood functions needed for Bayesian inference [5]. Next, having learned diagnostic likelihood, the computer can then compare a patient data profile with the profiles of all relevant diseases to instantly calculate a probabilistic diagnosis that is easily explained [6]. Finally, a utility function uses the probabilistic diagnosis to weight possible actions, producing a numerical score for making a medical decision [7].
We conducted a pilot study to assess the efficacy of this Bayesian architecture [8]. The test domain was cardiac chest pain in the emergency department (ED). The medical decision was whether to admit a patient to the hospital or send them home [9]. Weighing the dire consequences of not diagnosing an acute myocardial infarction (AMI) against the very high false positive rate (over 50 %) of unnecessary chest pain observation [10], this is an important real-world problem [11].
We identified a progression of six cardiac diagnoses, ranging from no disease to heart attack (Figure 1, columns). The diagnoses are not-acute cardiac syndrome (NACS), stable angina (SA), unstable angina (UA), non-ST elevation myocardial infarction (NSTEMI), developing acute myocardial infarction (DAMI), and classic acute myocardial infarction (CAMI).

Visually explaining differential diagnosis likelihood support with a heatmap for a patient who does not have AMI. (A) The heatmap visual explanation interface uses color to show an evidence array of likelihood support for patient data variables (25 rows) under heart-related diagnosis hypotheses (6 columns), with (B) optional numeric values. The logarithmic color values show absence (blue) or presence (red) of data support for a hypothesis. The figures were generated by the free Vaizian™ ED chest pain browser-based webapp (https://www.vaizian.com). The 25 patient variables addressed demography, past medical history, history of the present illness, physical examination, electrocardiogram, and laboratory tests (Supplemental Table 1). The low BAYES score of 1.119 would discharge this non-cardiac patient, while the medium HEART score of 5 would unnecessarily observe them.
Based on clinical experience and retrospective studies, we estimated diagnosis prevalence for NACS at 52.6 %, SA and UA each at 10.5 %, NSTEMI at 15.8 %, and DAMI and CAMI each at 5.3 % [12]. Prevalence can be adjusted to a patient population.
We developed twenty-five clinical variables that can help diagnose cardiac chest pain (Figure 1, rows; Supplemental Table 1). The measures are largely based on patient history, but also included electrocardiogram observations and high-sensitivity troponin levels [13].
While our medical expert (DA) was Chief Resident in an urban Chicago, Illinois teaching ED, he determined the variables’ discrete values and ranges (Supplemental Table 2). To simulate EHR database counts, we elicited from him data feature frequencies as the relative occurrences of each variable’s clinical values, conditioned on diagnosis. We used these numbers to construct the requisite 150 likelihood functions, one for each diagnosis-variable pair.
A patient’s 25 clinical data features can be assessed for the 6 cardiac diagnosis hypotheses, producing likelihood ratios (LR). A heat map presentation of these 150 log (LR) values helps explain the computer’s reasoning (Figure 1a). Viewing the map vertically down the rows shows how a diagnosis explains many clinical data points. Horizontal viewing across columns compares multiple diagnoses at one variable. A red color indicates support for a diagnosis, while blue does not. Numerical values can be superimposed (Figure 1B). The heat map shown is for a non-AMI patient, expressing greater likelihood for benign conditions.
Bayes theorem [14] combines these data-centered likelihood values with the prevalence of each diagnosis to yield the probability of every diagnosis, forming a “probabilistic diagnosis”. Clinicians can use this probabilistic diagnosis to assess their patient along the spectrum of cardiac disease severity. A utility function weights patient disposition by diagnosis probability, summing across diagnoses to form a numerical score. This Bayesian Assessment of Your Emergency Symptoms (BAYES) score can help make a patient disposition decision.
The Discharge disposition decision sends a patient home, Observe keeps them for under two nights, and Admit is for longer hospitalization. We assigned Discharge a utility value of 1, Observe a utility of 2, and Admit utility 3. An ED physician would Discharge a patient with an NACS or SA diagnosis, Observe them for UA or NSTEMI, or Admit them for DAMI or CAMI.
We collected clinical data on 100 sequential patients presenting with chest pain in a community-based Atlanta, Georgia ED. Coauthor DA and 32 other attending ED physicians assessed these patients over 20 shifts between March and June of 2022. DA entered a patient’s clinical and demographic data, along with their HEART score [15], into our Vaizian™ webapp’s [16] custom clinical template.
On this patient data, our Bayesian software inferred a probabilistic diagnosis, from which it calculated a BAYES decision score. We recorded the ED’s disposition decision. We also recorded the final diagnosis (AMI or not), based on a stress test or cardiac catheterization. Patients with Takotsubo cardiomyopathy or missing data were excluded from the study.
Of the 100 patients studied, eight were ultimately diagnosed with AMI disease, while 92 had a non-AMI condition. The mean BAYES score for the non-AMI group was 1.186, with a standard deviation of 0.313. The AMI mean was 2.631, with standard deviation 0.297. There was no overlap between the BAYES scores of the two groups.
We conducted a logistic regression on the BAYES decision score to predict the binary probability of AMI diagnosis. The logistic model had a BAYES score and a constant term, with outcomes following a Bernoulli distribution.
The regression curve sharply differentiated between non-AMI and AMI patients. The steep slope showed the BAYES score to be a good predictor of AMI. Relative to a constant model, the logistic model had a 55.8 Chi-square (p=8.21 × 10−14). The small p-value suggests statistical significance. The midpoint of the regression curve, representing an AMI final diagnosis probability of 50 %, gave a midpoint decision cutoff score of 2.136.
Applying a cutoff to a continuous score produces a binary result that can be used as a decision test – admit or not. Observation was not an option. We cross-tabulated these score-based test results (T) with AMI disease status (D) using a cutoff of 2, the mid-range Observe utility value. Contingency Table 1a shows no false negatives (FN=0), and a low false positive rate of 3 in 92 (FP/D−). These false positives would have unnecessarily admitted 3.3 % of the non-AMI patients for observation.
Action × AMI contingency tables for BAYES and HEART.
Action | AMI | ∼AMI | Subtotal |
---|---|---|---|
(a) | |||
Admit | 8 (TP) | 3 (FP) | 11 (T+) |
∼Admit | 0 (FN) | 89 (TN) | 89 (T−) |
8 (D+) | 92 (D−) | 100 | |
(b) | |||
Observe | 8 (TP) | 54 (FP) | 62 (T+) |
∼Observe | 0 (FN) | 38 (TN) | 38 (T−) |
8 (D+) | 92 (D−) | 100 |
-
(a) BAYES decision score for admission using a cutoff of 2 (1–3 scale). (b) HEART decision score for observation using a cutoff of 3.5 (0–10 scale). A positive BAYES test (T+) would Admit a patient to the hospital, a positive HEART test (T+) would at least Observe them, and a negative test (T−) would Discharge them home. A patient may have AMI disease (D+) or not (D−). Counts are shown for true positive (TP), false negative (FN), false positive (FP) and true negative (TN) outcomes.
We tested the contingency table for nonrandom association between the BAYES score test and having AMI disease. Some table entries have fewer than 5 counts, so Fisher’s Exact Test was used. The null hypothesis was rejected with a p-value of 8.87 × 10−10, a small probability supporting statistical significance.
Receiver operator characteristic (ROC) analysis assesses the diagnostic efficacy of a binary classifier variable. We conducted an ROC analysis of the paired BAYES decision score and AMI disease data. The ROC curve revealed an effective classifier that rose vertically from the origin, moving up the left y-axis, and then horizontally across the top x-axis.
The ROC’s area under the curve (AUC) measure was 1.0 (maximum), indicating high diagnostic accuracy for the decision score on this data set. An optimal ROC cutoff maximizes the sensitivity and specificity sum. Here the cutoff score was 2.170, which is close to both the expert’s cutoff of 2 (the Observe utility) and the logistic regression midpoint value of 2.136. These ROC and logistic scores may serve as useful training set cutoffs in future studies.
Many EDs use the HEART score to make chest pain disposition decisions [15]. Our hospital’s HEART criteria were a score of 0–3 for discharging the patient home, 4 to 6 for clinical observation, and 7 to 10 for admitting the patient to the hospital for early invasive strategies. A HEART score was calculated for all patients. The average non-AMI HEART score was 3.902 (std dev=2.006) and 7.0 (0.756) for AMI. The HEART distributions for AMI and non-AMI patients overlapped, whereas the BAYES distributions did not (Supplemental Figure 1).
The HEART logistic regression curve had a gentle slope that didn’t clearly distinguish AMI from non-AMI. The Chi-square statistic relative to a constant model was 19.2 (p=1.15 × 10−5). The curve midpoint was 7.965. That high cutoff value didn’t differentiate AMI from non-AMI, and would have sent home 6 of the eight sick AMI patients, while still admitting 5 healthy non-AMI patients.
In our ED, the HEART score cutoff for observation is 3.5. Contingency Table 1b shows no false negatives (FN=0), but a high false positive rate of 54 in 92 (FP/D−). The false positives would have admitted 58.7 % of the healthy non-AMI patients for unnecessary observation [17].
The HEART score ROC curve was less diagnostically efficient than BAYES. HEART’s AUC was 0.928, exhibiting less accuracy than BAYES’ AUC of 1. At the optimal ROC cutoff of 9, the HEART score would have misdiagnosed the eight AMI patients, sending all the sick people home.
The pilot study results are encouraging. The data-driven BAYES score approach was able to learn from counting data frequency, make fast and accurate decisions, and explain its reasoning. These features satisfied our architectural criteria. Relative to the HEART score, our BAYES approach improved diagnostic accuracy from an ROC AUC of 0.928 to 1.0. BAYES essentially eliminated observation status for chest pain in the ED, reducing HEART’s false positive rate 18-fold from 58.7 % to 3.3 %. The pilot showed the method’s potential for improving medical decisions and lowering health care costs, without compromising patient safety.
In the pilot, a physician (DA) entered patient data into a custom web app [16]. A DA patient took under a minute to enter, while chart review took 5 min. However, a patient can enter most of their history information themselves, either at home or in the ED, perhaps with staff assistance. Laboratory test results can be read from an EHR. Going forward, we expect that computers, not doctors, will automatically supply the data.
Our Vaizian approach represents actionable medical knowledge as likelihood functions. Variation in medical expertise will change likelihoods, diagnoses, and decisions. Better experts deliver better outcomes. Aggregating knowledge from multiple physicians supports diagnostic consensus. Gathering patient data from an institutional EHR database spanning many medical experts will form more consistent likelihoods. A BAYES expert system developed at a top tier medical center can be disseminated to other ERs for improved patient disposition.
The electrocardiogram Qwave variable didn’t impact the pilot study, since all hundred patients shared the same “unchanged” value. Pairwise Pearson ρ correlations of the other 24 variables had mean coefficient 0.0069 (std dev=0.1724). Of the 276 variable pairings, 3 were correlated with ρ>0.5. A larger validation study could help eliminate variables that strongly covary with others, or that have little impact on the BAYES decision score.
More research is needed. We are preparing a larger study of five to ten thousand patients, drawn from EDs at different institutions. A larger study size will enable more precise statistical assessment of the efficacy of our approach. A more diverse population will let us stratify by demographic variables to see how well partitioning patient data improves diagnostic accuracy [18]. Our system will directly access EHR databases, moving beyond manual expert instruction, and scale up to fully automated knowledge acquisition and machine learning.
Acknowledgments
The authors would like to thank Ariel Perlin of Vaizian for his assistance, the Wellstar Health System for their administrative support, and an anonymous reviewer for helpful comments that improved the quality of the manuscript.
-
Research ethics: The local Institutional Review Board deemed the study exempt from review.
-
Informed consent: Not applicable.
-
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission. MP designed the diagnostic methods, wrote the Bayesian software, and conducted the statistical analyses. DA provided the ED domain knowledge, developed the patient profiles, and entered them into the Bayesian software. Both authors contributed to writing the manuscript.
-
Use of Large Language Models, AI and Machine Learning Tools: None declared.
-
Conflict of interests: The authors are founders of Vaizian, a small unfunded Illinois startup company developing computer solutions for medical diagnosis.
-
Research funding: None declared.
-
Data availability: The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
1. Maude, J. Differential diagnosis: the key to reducing diagnosis error, measuring diagnosis and a mechanism to reduce healthcare costs. Diagnosis 2014;1:107–9. https://doi.org/10.1515/dx-2013-0009.Search in Google Scholar PubMed
2. Doudesis, D, Lee, KK, Boeddinghaus, J, Bularga, A, Ferry, AV, Tuck, C, et al.. Machine learning for diagnosis of myocardial infarction using cardiac troponin concentrations. Nat Med 2023;29:1201–10. https://doi.org/10.1038/s41591-023-02325-4.Search in Google Scholar PubMed PubMed Central
3. Rao, A, Aalami, O. Towards improving the visual explainability of artificial intelligence in the clinical setting. BMC Digit Health 2023;1. https://doi.org/10.1186/s44247-023-00022-3.Search in Google Scholar
4. Spiegelhalter, DJ, Dawid, AP, Lauritzen, SL, Cowell, RG. Bayesian analysis in expert systems. Stat Sci 1993;8:219–47. https://doi.org/10.1214/ss/1177010888.Search in Google Scholar
5. Gelman, A, Carlin, JB, Stern, HS, Rubin, D. Bayesian data analysis. Boca Raton, FL: Chapman & Hall/CRC; 1995.10.1201/9780429258411Search in Google Scholar
6. Perlin, MW, Legler, MM, Spencer, CE, Smith, JL, Allan, WP, Belrose, JL, et al.. Validating TrueAllele® DNA mixture interpretation. J Forensic Sci 2011;56:1430–47. https://doi.org/10.1111/j.1556-4029.2011.01859.x.Search in Google Scholar PubMed
7. Lindley, DV. Making decisions, 2nd ed. New York: John Wiley & Sons; 1991.Search in Google Scholar
8. Accilien, D. Reimagining medical diagnosis with Bayesian analysis. Emerg Med News 2024;46:18. https://doi.org/10.1097/01.eem.0001006980.34074.f0.Search in Google Scholar
9. Venkatesh, A, Geisler, B, Gibson Chambers, J, Baugh, C, Bohan, J, Schuur, JD. Use of observation care in US emergency departments, 2001–2008. PLOS ONE 2011;6(9):e24326. https://doi.org/10.1371/journal.pone.0024326.Search in Google Scholar PubMed PubMed Central
10. Spiegel, R, Sutherland, M, Brown, R, Honasoge, A, Witting, M. Clinically relevant adverse cardiovascular events in intermediate heart score patients admitted to the hospital following a negative emergency department evaluation. Am J Emerg Med 2021;46:469–75. https://doi.org/10.1016/j.ajem.2020.10.065.Search in Google Scholar PubMed
11. Public Policy Committee. The hospital observation care problem. Perspectives and Solutions from the Society of Hospital Medicine 2017. https://www.hospitalmedicine.org/globalassets/policy-and-advocacy/advocacy-pdf/shms-observation-white-paper-2017 [Accessed 18 October 2024].Search in Google Scholar
12. Kohn, MA, Kwan, E, Gupta, M, Tabas, JA. Prevalence of acute myocardial infarction and other serious diagnoses in patients presenting to an urban emergency department with chest pain. J Emerg Med 2005;29:383–90. https://doi.org/10.1016/j.jemermed.2005.04.010.Search in Google Scholar PubMed
13. Sandoval, Y, Apple, FS, Mahler, SA, Body, R, Collinson, PO, Jaffe, AS, et al.. High-sensitivity cardiac troponin and the 2021 AHA/ACC/ASE/CHEST/SAEM/SCCT/SCMR guidelines for the evaluation and diagnosis of acute chest pain. Circulation 2022;146:569–81. https://doi.org/10.1161/circulationaha.122.059678.Search in Google Scholar
14. MacKay, DJ. Information theory, inference and learning algorithms. Cambridge, UK: Cambridge University Press; 2003.Search in Google Scholar
15. Six, A, Backus, B, Kelder, J. Chest pain in the emergency room: value of the HEART score. Neth Heart J 2008;16:191–6. https://doi.org/10.1007/bf03086144.Search in Google Scholar
16. Vaizian. Vaizian™ ER chest pain; 2022. Available from: https://heartbeat.vaizian.com/webapps/home/session.html?app=HeartBeatWebApp.Search in Google Scholar
17. Mahler, SA, Riley, RF, Hiestand, BC, Russell, GB, Hoekstra, JW, Lefebvre, CW, et al.. The HEART Pathway randomized trial: identifying emergency department patients with acute chest pain for early discharge. Circulation: Cardiovasc Qual Outcomes 2015;8:195–203. https://doi.org/10.1161/circoutcomes.114.001384.Search in Google Scholar PubMed PubMed Central
18. Barron, R, Mader, TJ, Knee, A, Wilson, D, Wolfe, J, Gemme, SR, et al.. Influence of patient and clinician gender on Emergency Department HEART scores: a secondary analysis of a prospective observational trial. Ann Emerg Med 2024;83:123–31. https://doi.org/10.1016/j.annemergmed.2023.03.016.Search in Google Scholar PubMed
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/dx-2024-0049).
© 2024 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Frontmatter
- Review
- Systematic review and meta-analysis of observational studies evaluating glial fibrillary acidic protein (GFAP) and ubiquitin C-terminal hydrolase L1 (UCHL1) as blood biomarkers of mild acute traumatic brain injury (mTBI) or sport-related concussion (SRC) in adult subjects
- Opinion Papers
- From stable teamwork to dynamic teaming in the ambulatory care diagnostic process
- Bringing team science to the ambulatory diagnostic process: how do patients and clinicians develop shared mental models?
- Vitamin D assay and supplementation: still debatable issues
- Original Articles
- Developing a framework for understanding diagnostic reconciliation based on evidence review, stakeholder engagement, and practice evaluation
- Validity and reliability of Brier scoring for assessment of probabilistic diagnostic reasoning
- Impact of disclosing a working diagnosis during simulated patient handoff presentation in the emergency department: correctness matters
- Implementation of a bundle to improve diagnosis in hospitalized patients: lessons learned
- Time pressure in diagnosing written clinical cases: an experimental study on time constraints and perceived time pressure
- A decision support system to increase the compliance of diagnostic imaging examinations with imaging guidelines: focused on cerebrovascular diseases
- Bridging the divide: addressing discrepancies between clinical guidelines, policy guidelines, and biomarker utilization
- Unnecessary repetitions of C-reactive protein and leukocyte count at the emergency department observation unit contribute to higher hospital admission rates
- Quality control of ultrasonography markers for Down’s syndrome screening: a retrospective study by the laboratory
- Short Communications
- Unclassified green dots on nucleated red blood cells (nRBC) plot in DxH900 from a patient with hyperviscosity syndrome
- Bayesian intelligence for medical diagnosis: a pilot study on patient disposition for emergency medicine chest pain
- Case Report – Lessons in Clinical Reasoning
- A delayed diagnosis of hyperthyroidism in a patient with persistent vomiting in the presence of Chiari type 1 malformation
- Letters to the Editor
- Mpox (monkeypox) diagnostic kits – September 2024
- Barriers to diagnostic error reduction in Japan
- Superwarfarin poisoning: a challenging diagnosis
- Reviewer Acknowledgment
- Reviewer Acknowledgment
Articles in the same Issue
- Frontmatter
- Review
- Systematic review and meta-analysis of observational studies evaluating glial fibrillary acidic protein (GFAP) and ubiquitin C-terminal hydrolase L1 (UCHL1) as blood biomarkers of mild acute traumatic brain injury (mTBI) or sport-related concussion (SRC) in adult subjects
- Opinion Papers
- From stable teamwork to dynamic teaming in the ambulatory care diagnostic process
- Bringing team science to the ambulatory diagnostic process: how do patients and clinicians develop shared mental models?
- Vitamin D assay and supplementation: still debatable issues
- Original Articles
- Developing a framework for understanding diagnostic reconciliation based on evidence review, stakeholder engagement, and practice evaluation
- Validity and reliability of Brier scoring for assessment of probabilistic diagnostic reasoning
- Impact of disclosing a working diagnosis during simulated patient handoff presentation in the emergency department: correctness matters
- Implementation of a bundle to improve diagnosis in hospitalized patients: lessons learned
- Time pressure in diagnosing written clinical cases: an experimental study on time constraints and perceived time pressure
- A decision support system to increase the compliance of diagnostic imaging examinations with imaging guidelines: focused on cerebrovascular diseases
- Bridging the divide: addressing discrepancies between clinical guidelines, policy guidelines, and biomarker utilization
- Unnecessary repetitions of C-reactive protein and leukocyte count at the emergency department observation unit contribute to higher hospital admission rates
- Quality control of ultrasonography markers for Down’s syndrome screening: a retrospective study by the laboratory
- Short Communications
- Unclassified green dots on nucleated red blood cells (nRBC) plot in DxH900 from a patient with hyperviscosity syndrome
- Bayesian intelligence for medical diagnosis: a pilot study on patient disposition for emergency medicine chest pain
- Case Report – Lessons in Clinical Reasoning
- A delayed diagnosis of hyperthyroidism in a patient with persistent vomiting in the presence of Chiari type 1 malformation
- Letters to the Editor
- Mpox (monkeypox) diagnostic kits – September 2024
- Barriers to diagnostic error reduction in Japan
- Superwarfarin poisoning: a challenging diagnosis
- Reviewer Acknowledgment
- Reviewer Acknowledgment