Abstract
Objectives
Educators need tools for the assessment of clinical reasoning that reflect the ambiguity of real-world practice and measure learners’ ability to determine diagnostic likelihood. In this study, the authors describe the use of the Brier score to assess and provide feedback on the quality of probabilistic diagnostic reasoning.
Methods
The authors describe a novel format called Diagnostic Forecasting (DxF), in which participants read a brief clinical case and assign a probability to each item on a differential diagnosis, order tests and select a final diagnosis. DxF was piloted in a cohort of senior medical students. DxF evaluated students’ answers with Brier scores, which compare probabilistic forecasts with case outcomes. The validity of Brier scores in DxF was assessed by comparison to subsequent decision-making in the game environment of DxF, as well as external criteria including medical knowledge tests and performance on clinical rotations.
Results
Brier scores were statistically significantly correlated with diagnostic accuracy (95 % CI −4.4 to −0.44) and with mean scores on the National Board of Medical Examiners (NBME) shelf exams (95 % CI −474.6 to −225.1). Brier scores did not correlate with clerkship grades or performance on a structured clinical skills exam. Reliability as measured by within-student correlation was low.
Conclusions
Brier scoring showed evidence for validity as a measurement of medical knowledge and predictor of clinical decision-making. Further work must evaluated the ability of Brier scores to predict clinical and workplace-based outcomes, and develop reliable approaches to measuring probabilistic reasoning.
Funding source: University Hospitals Graduate Medical Education
Award Identifier / Grant number: Innovation Award # P0478
Funding source: Zucker Neurology Fund
Acknowledgments
The authors thank Lauren Shurtleff, Paul Shaniuk, and Arsalan Derakhshan for authoring cases utilized in this study. They have been informed that they are being acknowledged for their contributions.
-
Research ethics: This study was reviewed and approved by the Case Western Reserve University School of Medicine Institutional Review Board (IRB#20210682).
-
Informed consent: Informed consent was obtained from all individuals included in this study.
-
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission. NS: Design, implementation, analysis, writing. ARS: Design, implementation. LG: Design, analysis, editing. JA: Statistical analysis, writing, editing. KQ: Design, implementation, analysis, editing.
-
Use of Large Language Models, AI and Machine Learning Tools: None declared.
-
Conflict of interest: The authors state no conflict of interest.
-
Research funding: This work was supported in part by University Hospitals Graduate Medical Education Innovation Award # P0478 as well as funding from the private Zucker Neurology Fund which supported application development.
-
Data availability: The raw data can be obtained on request from the corresponding author.
References
1. Altkorn, D. Chapter 1–5: the threshold model: conceptualizing probabilities. In: Stern, S, Cifu, A, Altcorn, D, editors. Symptom to diagnosis: an evidence-based guide, 4th ed. New York: McGraw Hill; 2020.Search in Google Scholar
2. Custers, EJFM. Thirty years of illness scripts: theoretical origins and practical applications. Med Teach 2015;37:457–62. https://doi.org/10.3109/0142159X.2014.956052.Search in Google Scholar PubMed
3. Davidoff, F, Goodspeed, R, Clive, J. Changing test ordering behavior. A randomized controlled trial comparing probabilistic reasoning with cost-containment education. Med Care 1989;27:45–58. https://doi.org/10.1097/00005650-198901000-00005.Search in Google Scholar
4. Diamond, GA, Forrester, JS, Hirsch, M, Staniloff, HM, Berman, DS, Swan, HJC, et al.. Application of conditional probability analysis to the clinical diagnosis of coronary artery disease. J Clin Invest 1980;65:1210–21. https://doi.org/10.1172/jci109776.Search in Google Scholar
5. Bowen, JL. Educational strategies to promote clinical diagnostic reasoning. N Engl J Med 2006;355:2217–25. https://doi.org/10.1056/NEJMra054782.Search in Google Scholar PubMed
6. Marcum, JA. An integrated model of clinical reasoning: dual-process theory of cognition and metacognition. J Eval Clin Pract 2012;18:954–61. https://doi.org/10.1111/j.1365-2753.2012.01900.x.Search in Google Scholar PubMed
7. Kahneman, D. Thinking, fast and slow. New York: Farrar, Straus and Giroux; 2011.Search in Google Scholar
8. Tetlock, P, Gardner, D. Superforecasting. New York: Crown Publishers; 2015.Search in Google Scholar
9. Gill, CJ, Sabin, L, Schmid, CH. Why clinicians are natural bayesians. BMJ 2005;330:1080–3. https://doi.org/10.1136/bmj.330.7499.1080.Search in Google Scholar PubMed PubMed Central
10. Langarizadeh, M, Moghbeli, F. Applying naive bayesian networks to disease prediction: a systematic review. Acta Inf Med 2016;24:364–9. https://doi.org/10.5455/aim.2016.24.364-369.Search in Google Scholar PubMed PubMed Central
11. Goodman, KE, Rodman, AM, Morgan, DJ. Preparing physicians for the clinical algorithm era. N Engl J Med 2023;389:483–7. https://doi.org/10.1056/NEJMp2304839.Search in Google Scholar PubMed
12. Morgan, DJ, Pineles, L, Owczarzak, J, Magder, L, Scherer, L, Brown, JP, et al.. Accuracy of practitioner estimates of probability of diagnosis before and after testing. JAMA Intern Med 2021;181:747. https://doi.org/10.1001/jamainternmed.2021.0269.Search in Google Scholar PubMed PubMed Central
13. Custers, EJFM, Boshuizen, HPA, Schmidt, HG. The influence of medical expertise, case typicality, and illness script component on case processing and disease probability estimates. Mem Cognit 1996;24. https://doi.org/10.3758/bf03213301.Search in Google Scholar PubMed
14. Garbayo, LS, Harris, DM, Fiore, SM, Robinson, M, Kibble, JD. A metacognitive confidence calibration (MCC) tool to help medical students scaffold diagnostic reasoning in decision-making during high-fidelity patient simulations. Adv Physiol Educ 2023;47:71–81. https://doi.org/10.1152/advan.00156.2021.Search in Google Scholar PubMed
15. Cooke, S, Lemay, JF. Transforming medical assessment: integrating uncertainty into the evaluation of clinical reasoning in medical education. Acad Med 2017;92:746–51. https://doi.org/10.1097/ACM.0000000000001559.Search in Google Scholar PubMed
16. Helou, MA, DiazGranados, D, Ryan, MS, Cyrus, JW. Uncertainty in decision making in medicine. Acad Med 2020;95:157–65. https://doi.org/10.1097/ACM.0000000000002902.Search in Google Scholar PubMed PubMed Central
17. Lubarsky, S, Charlin, B, Cook, DA, Chalk, C, van der Vleuten, CPM. Script concordance testing: a review of published validity evidence. Med Educ 2011;45:329–38. https://doi.org/10.1111/j.1365-2923.2010.03863.x.Search in Google Scholar PubMed
18. Kün-Darbois, JD, Annweiler, C, Lerolle, N, Lebdai, S. Script concordance test acceptability and utility for assessing medical students’ clinical reasoning: a user’s survey and an institutional prospective evaluation of students’ scores. BMC Med Educ 2022;22. https://doi.org/10.1186/s12909-022-03339-1.Search in Google Scholar PubMed PubMed Central
19. Monteiro, SD, Sherbino, J, Schmidt, H, Mamede, S, Ilgen, J, Norman, G. It’s the destination: diagnostic accuracy and reasoning. Adv Health Sci Educ 2020;25:19–29. https://doi.org/10.1007/s10459-019-09903-7.Search in Google Scholar PubMed
20. Thammasitboon, S, Rencic, JJ, Trowbridge, RL, Olson, APJ, Sur, M, Dhaliwal, G. The assessment of reasoning tool (ART): structuring the conversation between teachers and learners. Diagnosis 2018;5. https://doi.org/10.1515/dx-2018-0052.Search in Google Scholar PubMed
21. Cook, DA, Brydges, R, Ginsburg, S, Hatala, R. A contemporary approach to validity arguments: a practical guide to Kane’s framework. Med Educ 2015;49:560–75. https://doi.org/10.1111/medu.12678.Search in Google Scholar PubMed
22. Kane, MT. Validating the interpretations and uses of test scores. J Educ Meas 2013;50:1–73. https://doi.org/10.1111/jedm.12000.Search in Google Scholar
23. Brier, GW. Verification of forecasts expressed in terms of probability. Mon Weather Rev 1950;78:1–3. https://doi.org/10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2.10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2Search in Google Scholar
24. Murphy, AH. A new vector partition of the probability score. J Appl Meteorol 1973;12:595–600. https://doi.org/10.1175/1520-0450(1973)012<0595:anvpot>2.0.co;2.10.1175/1520-0450(1973)012<0595:ANVPOT>2.0.CO;2Search in Google Scholar
25. Steyerberg, EW. Clinical prediction models: a practical approach to development, validation, and updating, 2nd ed. Cham, Switzerland: Springer Nature Switzerland AG; 2019.Search in Google Scholar
26. Ferro, C, Fricker, T. A bias-corrected decomposition of the Brier score. Q J R Meteorol Soc 2012;138:1954–60. https://doi.org/10.1002/qj.1924.Search in Google Scholar
27. Sunstein, C, Kahneman, D, Sibony, O. Noise: a flaw in human judgment, 1st ed. New York: Little, Brown Spark; 2021, 1.Search in Google Scholar
28. Assel, M, Sjoberg, DD, Vickers, AJ. The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models. Diagn Progn Res 2017;1. https://doi.org/10.1186/s41512-017-0020-3.Search in Google Scholar
29. Hrynchak, P, Glover Takahashi, S, Nayer, M. Key-feature questions for assessment of clinical reasoning: a literature review. Med Educ 2014;48:870–83. https://doi.org/10.1111/medu.12509.Search in Google Scholar
Supplementary Material
This article contains supplementary material (https://doi.org/10.1515/dx-2023-0109).
© 2024 Walter de Gruyter GmbH, Berlin/Boston
Articles in the same Issue
- Frontmatter
- Review
- Systematic review and meta-analysis of observational studies evaluating glial fibrillary acidic protein (GFAP) and ubiquitin C-terminal hydrolase L1 (UCHL1) as blood biomarkers of mild acute traumatic brain injury (mTBI) or sport-related concussion (SRC) in adult subjects
- Opinion Papers
- From stable teamwork to dynamic teaming in the ambulatory care diagnostic process
- Bringing team science to the ambulatory diagnostic process: how do patients and clinicians develop shared mental models?
- Vitamin D assay and supplementation: still debatable issues
- Original Articles
- Developing a framework for understanding diagnostic reconciliation based on evidence review, stakeholder engagement, and practice evaluation
- Validity and reliability of Brier scoring for assessment of probabilistic diagnostic reasoning
- Impact of disclosing a working diagnosis during simulated patient handoff presentation in the emergency department: correctness matters
- Implementation of a bundle to improve diagnosis in hospitalized patients: lessons learned
- Time pressure in diagnosing written clinical cases: an experimental study on time constraints and perceived time pressure
- A decision support system to increase the compliance of diagnostic imaging examinations with imaging guidelines: focused on cerebrovascular diseases
- Bridging the divide: addressing discrepancies between clinical guidelines, policy guidelines, and biomarker utilization
- Unnecessary repetitions of C-reactive protein and leukocyte count at the emergency department observation unit contribute to higher hospital admission rates
- Quality control of ultrasonography markers for Down’s syndrome screening: a retrospective study by the laboratory
- Short Communications
- Unclassified green dots on nucleated red blood cells (nRBC) plot in DxH900 from a patient with hyperviscosity syndrome
- Bayesian intelligence for medical diagnosis: a pilot study on patient disposition for emergency medicine chest pain
- Case Report – Lessons in Clinical Reasoning
- A delayed diagnosis of hyperthyroidism in a patient with persistent vomiting in the presence of Chiari type 1 malformation
- Letters to the Editor
- Mpox (monkeypox) diagnostic kits – September 2024
- Barriers to diagnostic error reduction in Japan
- Superwarfarin poisoning: a challenging diagnosis
- Reviewer Acknowledgment
- Reviewer Acknowledgment
Articles in the same Issue
- Frontmatter
- Review
- Systematic review and meta-analysis of observational studies evaluating glial fibrillary acidic protein (GFAP) and ubiquitin C-terminal hydrolase L1 (UCHL1) as blood biomarkers of mild acute traumatic brain injury (mTBI) or sport-related concussion (SRC) in adult subjects
- Opinion Papers
- From stable teamwork to dynamic teaming in the ambulatory care diagnostic process
- Bringing team science to the ambulatory diagnostic process: how do patients and clinicians develop shared mental models?
- Vitamin D assay and supplementation: still debatable issues
- Original Articles
- Developing a framework for understanding diagnostic reconciliation based on evidence review, stakeholder engagement, and practice evaluation
- Validity and reliability of Brier scoring for assessment of probabilistic diagnostic reasoning
- Impact of disclosing a working diagnosis during simulated patient handoff presentation in the emergency department: correctness matters
- Implementation of a bundle to improve diagnosis in hospitalized patients: lessons learned
- Time pressure in diagnosing written clinical cases: an experimental study on time constraints and perceived time pressure
- A decision support system to increase the compliance of diagnostic imaging examinations with imaging guidelines: focused on cerebrovascular diseases
- Bridging the divide: addressing discrepancies between clinical guidelines, policy guidelines, and biomarker utilization
- Unnecessary repetitions of C-reactive protein and leukocyte count at the emergency department observation unit contribute to higher hospital admission rates
- Quality control of ultrasonography markers for Down’s syndrome screening: a retrospective study by the laboratory
- Short Communications
- Unclassified green dots on nucleated red blood cells (nRBC) plot in DxH900 from a patient with hyperviscosity syndrome
- Bayesian intelligence for medical diagnosis: a pilot study on patient disposition for emergency medicine chest pain
- Case Report – Lessons in Clinical Reasoning
- A delayed diagnosis of hyperthyroidism in a patient with persistent vomiting in the presence of Chiari type 1 malformation
- Letters to the Editor
- Mpox (monkeypox) diagnostic kits – September 2024
- Barriers to diagnostic error reduction in Japan
- Superwarfarin poisoning: a challenging diagnosis
- Reviewer Acknowledgment
- Reviewer Acknowledgment