Abstract
Objectives
Collective intelligence, the “wisdom of the crowd,” seeks to improve the quality of judgments by aggregating multiple individual inputs. Here, we evaluate the success of collective intelligence strategies applied to probabilistic diagnostic judgments.
Methods
We compared the performance of individual and collective intelligence judgments on two series of clinical cases requiring probabilistic diagnostic assessments, or “forecasts”. We assessed the quality of forecasts using Brier scores, which compare forecasts to observed outcomes.
Results
On both sets of cases, the collective intelligence answers outperformed nearly every individual forecaster or team. The improved performance by collective intelligence was mediated by both improved resolution and calibration of probabilistic assessments. In a secondary analysis looking at the effect of varying number of individual inputs in collective intelligence answers from two different data sources, nearly identical curves were found in the two data sets showing 11–12% improvement when averaging two independent inputs, 15% improvement averaging four independent inputs, and small incremental improvements with further increases in number of individual inputs.
Conclusions
Our results suggest that the application of collective intelligence strategies to probabilistic diagnostic forecasts is a promising approach to improve diagnostic accuracy and reduce diagnostic error.
Funding source: University Hospitals Graduate Medical Education
Award Identifier / Grant number: Innovation Award #P0478
Acknowledgments
The authors thank Lukasz Weiner, MD, Paul Shaniuk, MD, Lauren Sackett, MD, and Collin Swafford, MD for their contributions of authorship of cases used in the study.
- 
Research funding: This work was supported in part by University Hospitals Graduate Medical Education Innovation Award # P0478. 
- 
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission. 
- 
Competing interests: Dr. Stehouwer reports receiving a stipend to serve on the editorial board of the New England Journal of Medicine Healer application, which teaches diagnostic reasoning. Dr. Dell and Dr. Torrey report no relevant conflicts of interest. 
- 
Informed consent: Not applicable. No identifiable information was utilized for this study. 
- 
Ethical approval: The local Institutional Review Board deemed the study exempt from review. 
References
1. Radcliffe, K, Lyson, HC, Barr-Walker, J, Sarkar, U. Collective intelligence in medical decision-making: a systematic scoping review. BMC Med Inf Decis Making 2019;19:158. https://doi.org/10.1186/s12911-019-0882-0.Suche in Google Scholar PubMed PubMed Central
2. Fontil, V, Radcliffe, K, Lyson, HC, Ratanawongsa, N, Lyles, C, Tuot, D, et al.. Testing and improving the acceptability of a web-based platform for collective intelligence to improve diagnostic accuracy in primary care clinics. JAMIA Open 2019;2:40–8. https://doi.org/10.1093/jamiaopen/ooy058.Suche in Google Scholar PubMed PubMed Central
3. Poses, RM, Bekes, C, Winkler, RL, Scott, WE, Copare, FJ. Are two (inexperienced) heads better than one (experienced) head? Arch Intern Med 1990;150:1874–8. https://doi.org/10.1001/archinte.150.9.1874.Suche in Google Scholar
4. Winkler, RL, Poses, RM. Evaluating and combining physicians’ probabilities of survival in an intensive care unit. Manag Sci 1993;39:1526–43. https://doi.org/10.1287/mnsc.39.12.1526.Suche in Google Scholar
5. Kurvers, RHJM, Krause, J, Argenziano, G, Zalaudek, I, Wolf, M. Detection accuracy of collective intelligence assessments for skin cancer diagnosis. JAMA Dermatol 2015;151:1346–53. https://doi.org/10.1001/jamadermatol.2015.3149.Suche in Google Scholar PubMed
6. Kurvers, RHJM, Herzog, SM, Hertwig, R, Krause, J, Carney, PA, Bogart, A, et al.. Boosting medical diagnostics by pooling independent judgments. Proc Natl Acad Sci USA 2016;113:8777–82. https://doi.org/10.1073/pnas.1601827113.Suche in Google Scholar PubMed PubMed Central
7. Barnett, ML, Boddupalli, D, Nundy, S, Bates, DW. Comparative accuracy of diagnosis by collective intelligence of multiple physicians vs individual physicians. JAMA Netw Open 2019;2:e190096. https://doi.org/10.1001/jamanetworkopen.2019.0096.Suche in Google Scholar PubMed PubMed Central
8. Wolf, M, Krause, J, Carney, PA, Bogart, A, Kurvers, RHJM. Collective intelligence meets medical decision-making: the collective outperforms the best radiologist. PLoS One 2015;10. https://doi.org/10.1371/journal.pone.0134269.Suche in Google Scholar PubMed PubMed Central
9. Kämmer, JE, Hautz, WE, Herzog, SM, Kunina-Habenicht, O, Kurvers, RHJM. The potential of collective intelligence in emergency medicine: pooling medical students᾽ independent decisions improves diagnostic performance. Med Decis Making 2017;37:715–24. https://doi.org/10.1177/0272989x17696998.Suche in Google Scholar PubMed
10. Krockow, EM, Kurvers, RHJM, Herzog, SM, Kämmer, JE, Hamilton, RA, Thilly, N, et al.. Harnessing the wisdom of crowds can improve guideline compliance of antibiotic prescribers and support antimicrobial stewardship. Sci Rep 2020;10. https://doi.org/10.1038/s41598-020-75063-z.Suche in Google Scholar PubMed PubMed Central
11. Altkorn, D. Chapter 1–5: the threshold model: conceptualizing probabilities. In: Stern, S, Cifu, A, Altcorn, D, editors. Symptom to diagnosis: an evidence-based guide, 4th ed. New York: McGraw-Hill; 2020.Suche in Google Scholar
12. Winkler, RL, Grushka-Cockayne, Y, Lichtendahl, KC, Jose, RR. Probability forecasts and their combination: a research perspective 1. Decis Anal 2019;16:239–60. https://doi.org/10.1287/deca.2019.0391.Suche in Google Scholar
13. Baron, J, Mellers, BA, Tetlock, PE, Stone, E, Ungar, LH. Two reasons to make aggregated probability forecasts more extreme. Decis Anal 2014;11:133–45. https://doi.org/10.1287/deca.2014.0293.Suche in Google Scholar
14. Brier, GW. Verification of forecasts expressed in terms of probability. Mon Weather Rev 1950;78:1–3. https://doi.org/10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2.10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2Suche in Google Scholar
15. Murphy, AH. A new vector partition of the probability score. J Appl Meteorol 1973;12:595–600. https://doi.org/10.1175/1520-0450(1973)012<0595:anvpot>2.0.co;2.10.1175/1520-0450(1973)012<0595:ANVPOT>2.0.CO;2Suche in Google Scholar
16. Dezecache, G, Dockendorff, M, Ferreiro, DN, Deroy, O, Bahrami, B. Democratic forecast: small groups predict the future better than individuals and crowds. J Exp Psychol Appl 2022;28:525–37. https://doi.org/10.1037/xap0000424.Suche in Google Scholar
17. Han, Y, Budescu, DV. Recalibrating probabilistic forecasts to improve their accuracy. Judgm Decis Mak 2022;17:91. https://doi.org/10.1017/s1930297500009049.Suche in Google Scholar
18. Attali, Y, Budescu, D, Arieli-Attali, M. An item response approach to calibration of confidence judgments. Decision 2020;7:1–19. https://doi.org/10.1037/dec0000111.Suche in Google Scholar
19. Navajas, J, Niella, T, Garbulsky, G, Bahrami, B, Sigman, M. Aggregated knowledge from a small number of debates outperforms the wisdom of large crowds. Nat Human Behav 2018;2:126–32. https://doi.org/10.1038/s41562-017-0273-4.Suche in Google Scholar
20. Sunstein, C, Kahneman, D, Sibony, O. Noise: a flaw in human judgment, 1st ed. Boston, MA: Little, Brown Spark; 2021, vol 1.Suche in Google Scholar
21. Tetlock, P, Gardner, D. Superforecasting. New York: Crown Publishers; 2015.Suche in Google Scholar
© 2023 Walter de Gruyter GmbH, Berlin/Boston
Artikel in diesem Heft
- Frontmatter
- Editorials
- An equation for excellence in clinical reasoning
- Quantifying diagnostic excellence
- Review
- A scoping review of distributed cognition in acute care clinical decision-making
- Opinion Papers
- Context matters: toward a multilevel perspective on context in clinical reasoning and error
- Occam’s razor and Hickam’s dictum: a dermatologic perspective
- Original Articles
- Differences in clinical reasoning between female and male medical students
- Introducing second-year medical students to diagnostic reasoning concepts and skills via a virtual curriculum
- Bad things can happen: are medical students aware of patient centered care and safety?
- Impact of diagnostic checklists on the interpretation of normal and abnormal electrocardiograms
- Cerebrospinal fluid lactate as a predictive biomarker for tuberculous meningitis diagnosis
- Empowering quality data – the Gordian knot of bringing real innovation into healthcare system
- Collective intelligence improves probabilistic diagnostic assessments
- Why people fail to participate in annual skin cancer screening: creation of the perceptions of annual skin cancer screening scale (PASCSS)
- Instructions on appropriate fasting prior to phlebotomy; effects on patient awareness, preparation, and biochemical parameters
- Clinician factors associated with delayed diagnosis of appendicitis
- Real-world assessment of the clinical performance of COVID-VIRO ALL IN rapid SARS-CoV-2 antigen test
- Lack of a prompt normalization of immunological parameters is associated with long-term care and poor prognosis in COVID-19 affected patients receiving convalescent plasma: a single center experience
- Letters to the Editor
- Uncontrolled confounding in COVID-19 epidemiology
- VAPES: a new mnemonic for considering paroxysmal disorders
- Congress Abstracts
- SIDM2022 15th Annual International Conference
Artikel in diesem Heft
- Frontmatter
- Editorials
- An equation for excellence in clinical reasoning
- Quantifying diagnostic excellence
- Review
- A scoping review of distributed cognition in acute care clinical decision-making
- Opinion Papers
- Context matters: toward a multilevel perspective on context in clinical reasoning and error
- Occam’s razor and Hickam’s dictum: a dermatologic perspective
- Original Articles
- Differences in clinical reasoning between female and male medical students
- Introducing second-year medical students to diagnostic reasoning concepts and skills via a virtual curriculum
- Bad things can happen: are medical students aware of patient centered care and safety?
- Impact of diagnostic checklists on the interpretation of normal and abnormal electrocardiograms
- Cerebrospinal fluid lactate as a predictive biomarker for tuberculous meningitis diagnosis
- Empowering quality data – the Gordian knot of bringing real innovation into healthcare system
- Collective intelligence improves probabilistic diagnostic assessments
- Why people fail to participate in annual skin cancer screening: creation of the perceptions of annual skin cancer screening scale (PASCSS)
- Instructions on appropriate fasting prior to phlebotomy; effects on patient awareness, preparation, and biochemical parameters
- Clinician factors associated with delayed diagnosis of appendicitis
- Real-world assessment of the clinical performance of COVID-VIRO ALL IN rapid SARS-CoV-2 antigen test
- Lack of a prompt normalization of immunological parameters is associated with long-term care and poor prognosis in COVID-19 affected patients receiving convalescent plasma: a single center experience
- Letters to the Editor
- Uncontrolled confounding in COVID-19 epidemiology
- VAPES: a new mnemonic for considering paroxysmal disorders
- Congress Abstracts
- SIDM2022 15th Annual International Conference