Home Medical language matters: impact of clinical summary composition on a generative artificial intelligence’s diagnostic accuracy
Article
Licensed
Unlicensed Requires Authentication

Medical language matters: impact of clinical summary composition on a generative artificial intelligence’s diagnostic accuracy

  • Cassandra Skittle ORCID logo EMAIL logo , Eliana Bonifacino and Casey N. McQuade ORCID logo
Published/Copyright: December 12, 2024

Abstract

Objectives

Evaluate the impact of problem representation (PR) characteristics on Generative Artificial Intelligence (GAI) diagnostic accuracy.

Methods

Internal medicine attendings and residents from two academic medical centers were given a clinical vignette and instructed to write a PR. Deductive content analysis described the characteristics comprising each PR. Individual PRs were input into ChatGPT-4 (OpenAI, September 2023) which was prompted to generate a ranked three-item differential. The ranked differential and the top-ranked diagnosis were scored on a 3-part scale, ranging from incorrect, partially correct, to correct. Logistic regression evaluated individual PR characteristic’s impact on ChatGPT accuracy.

Results

For a three-item differential, accuracy was associated with including fewer comorbidities (OR 0.57, p=0.010), fewer past historical items (OR 0.60, p=0.019), and more physical examination items (OR 1.66, p=0.015). For ChatGPT’s ability to rank the true diagnosis as the single-best diagnosis, utilizing temporal semantic qualifiers, more semantic qualifiers overall, and adhering to a typical 3-part PR format all correlated with diagnostic accuracy: OR 3.447, p=0.046; OR 1.300, p=0.005; OR 3.577, p=0.020, respectively.

Conclusions

Several distinct PR factors improved ChatGPT diagnostic accuracy. These factors have previously been associated with expertise in creating PR. Future studies should explore how clinical input qualities affect GAI diagnostic accuracy prospectively.


Corresponding author: Cassandra Skittle, MD, MBA, University of Colorado Anschutz Medical Campus, 12401 East 17th Avenue, 4th Floor 80045, Aurora, CO, USA, E-mail:

  1. Research ethics: Not applicable.

  2. Informed consent: Not applicable.

  3. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

  4. Use of Large Language Models, AI and Machine Learning Tools: None declared.

  5. Conflict of interest: The authors state no conflict of interest.

  6. Research funding: None declared.

  7. Data availability: Not applicable.

Appendix

Vignette

Chief Complaint: “I’m having trouble breathing.”

History of Present Illness: A 47 year old Caucasian woman is admitted with new onset shortness of breath.

The patient states that 3 months ago, she started noticing that she was getting winded going up the stairs in her house. Since then, her shortness of breath has gotten worse and she now gets short of breath with just walking from her bedroom to the bathroom. She has noticed that her pants feel tighter than usual and that her stomach is bloated. Her ankles also feel swollen. Over the last 3 days, she started noticing some sharp, nonradiating chest pains under her breastbone with more strenuous activities. These latest symptoms prompted her presentation to the emergency department.

She overall has been trying to lose weight and does not think she is pregnant. She has occasionally felt lightheaded during exertion but denies feeling short of breath with lying down or awaking from sleep feeling dyspneic. She denies fevers, chills, nausea, vomiting, diarrhea, abdominal pain, blood in her stool. She has a chronic cough rarely productive of scant sputum (what she calls her “smoker’s cough”) which is unchanged from baseline.

Upon presentation to the emergency department, she reports feeling some mild dyspnea at rest but no chest pain.

Past Medical History: [Items relevant to a pulmonary hypertension diagnosis: hypertension, tobacco use]

Anxiety; Hypertension; Opioid use disorder with injection heroin use, in remission; Polycystic ovarian syndrome; Vasovagal syncope.

Past Surgical History: Cholecystectomy 5 years ago.

Family History: Mother: hypertension; Father: prostate cancer, coronary artery disease (no myocardial infarction history); No other history of malignancy.

Social History: Lives in rural Pennsylvania with her mother. Works as a waitress. Currently smokes tobacco cigarettes, 2 packs per day for the last 30 years. She denies vaping, alcohol use. She reports abstinence from recreational drugs for 15 years.

Review of Systems: As mentioned per HPI.

Medications: Losartan; Suboxone

Physical Examination:

Temp: 36.6 C, BP: 145/86, Pulse: 102, RR: 18, SpO2: 92 % on 4 L/m, BMI 31.6

General: Appears stated age, no acute distress

HEENT: Sclera are anicteric, moist mucosae, oropharynx clear. Facial plethora.

Lymph: No palpable lymphadenopathy.

Pulmonary: Lungs are clear to auscultation, symmetric chest expansion. Use of accessory muscles of respirations noted.

Cardiac: Regular rate and rhythm. No murmurs, rubs, or gallops. JVP is 13 cm at 30*. 2+ pitting edema bilaterally to knees.

Gastrointestinal: Normoactive bowel sounds. Nontender. Distention present with a fluid wave. No palpable organomegaly.

Extremities: Bilateral digital clubbing of the hands.

Neurologic: Alert and Oriented ×3. No asterixis. CN 2–12 are intact. 5/5 strength throughout. Gait normal but evaluation is limited as patient becomes visibly dyspneic.

Other Studies:

Na: 133; Cl:82; K:4.2; CO2: 42; BUN: 9; Cr: 0.6; Glu: 126

WBC: 9.2; Hgb: 13.8; Plt: 205; INR:1.3;

ALT: 8; AST: 13; Alk Phos: 135; Alb: 3.4; TP: 6.6; Total Bilirubin: 1.9

Urine pregnancy testing: negative; Troponin: negative

EKG: Sinus tachycardia with poor R-wave progression and inferior T-wave inversions that are new since last EKG.

Chest X-ray, PA and Lateral views: No acute pulmonary disease. Heart is of normal size.

References

1. Kanjee, Z, Crowe, B, Rodman, A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 2023;330:78.10.1001/jama.2023.8288Search in Google Scholar PubMed PubMed Central

2. Kung, TH, Cheatham, M, Medenilla, A, Sillos, C, De Leon, L, Elepaño, C, et al.. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health 2023;2:e0000198. https://doi.org/10.1371/journal.pdig.0000198.Search in Google Scholar PubMed PubMed Central

3. Eriksen, AV, Möller, S, Ryg, J. Use of GPT-4 to diagnose complex clinical cases. NEJM AI 1 2023:AIp2300031.10.1056/AIp2300031Search in Google Scholar

4. Savage, T, Nayak, A, Gallo, R, Rangan, E, Chen, JH. Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine. Npj Digit Med. 2024;7:1–7. https://doi.org/10.1038/s41746-024-01010-1.Search in Google Scholar PubMed PubMed Central

5. McQuade, C, Simonson, MG, Lister, J, Olson, APJ, Zwaan, L, Rothenberger, S, et al.. What makes a good problem representation? Characteristics differentiating problem representation synthesis between novices and experts. J Hosp Med 2024;19:468–474. https://doi.org/10.1002/jhm.13335.Search in Google Scholar PubMed

6. Mamede, S, van Gog, T, van den Berge, K, Rikers, RMJP, van Saase, JLCM, van Guldener, C, et al.. Effect of availability bias and reflective reasoning on diagnostic accuracy among internal medicine residents. JAMA 2010;304:1198–203. https://doi.org/10.1001/jama.2010.1276.Search in Google Scholar PubMed

7. Bordage, G, Lemieux, M. Semantic structures and diagnostic thinking of experts and novices. Acad Med J Assoc Am Med Coll 1991;66:S70–72. https://doi.org/10.1097/00001888-199109001-00025.Search in Google Scholar

8. How ChatGPT and our language models are developed | OpenAI Help Center. https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-language-models-are-developed.Search in Google Scholar

Received: 2024-10-28
Accepted: 2024-11-05
Published Online: 2024-12-12

© 2024 Walter de Gruyter GmbH, Berlin/Boston

Articles in the same Issue

  1. Frontmatter
  2. Editorial
  3. Pioneering diagnosis in Asia: advancing clinical reasoning expertise through the lens of 3M
  4. Short Communication
  5. The foundations of the diagnostic error movement: a tribute to Eta Berner, PhD
  6. Reviews
  7. Interventions to improve timely cancer diagnosis: an integrative review
  8. Technical aspects and clinical applications of synthetic MRI: a scoping review
  9. Mini Review
  10. Challenges and barriers for the adoption of personalized medicine in Europe: the case of Oncotype DX Breast Recurrence Score® test
  11. Opinion Papers
  12. Beyond thinking fast and slow: a Bayesian intuitionist model of clinical reasoning in real-world practice
  13. Diagnostic scope: the AI can’t see what the mind doesn’t know
  14. Guidelines and Recommendations
  15. CDC’s Core Elements to promote diagnostic excellence
  16. Original Articles
  17. Trends of diagnostic adverse events in hospital deaths: longitudinal analyses of four retrospective record review studies
  18. The effect of a provisional diagnosis on intern diagnostic reasoning: a mixed methods study
  19. On context specificity and management reasoning: moving beyond diagnosis
  20. Diagnostic errors in patients admitted directly from new outpatient visits
  21. Breaking the guidelines: how financial unawareness fuels guideline deviations and inefficient DVT diagnostics
  22. Harbingers of sepsis misdiagnosis among pediatric emergency department patients
  23. Factors affecting diagnostic difficulties in aseptic meningitis: a retrospective observational study
  24. Prenatal diagnostic errors in hemoglobin Bart’s hydrops fetalis caused by rare genetic interactions of α-thalassemia
  25. Screening fasting glucose before the OGTT: near-patient glucometer- or laboratory-based measurement?
  26. Three-way comparison of different ESR measurement methods and analytical performance assessment of TEST1 automated ESR analyzer
  27. Short Communications
  28. Medical language matters: impact of clinical summary composition on a generative artificial intelligence’s diagnostic accuracy
  29. Impact of meta-memory techniques in generating effective differential diagnoses in a pediatric core clerkship
Downloaded on 27.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/dx-2024-0167/html?lang=en
Scroll to top button