Medical malpractice liability in large language model artificial intelligence: legal review and policy recommendations

David O. Shumway; Hayes J. Hartman

doi:10.1515/jom-2023-0229

Article Open Access

Medical malpractice liability in large language model artificial intelligence: legal review and policy recommendations

Published/Copyright: January 31, 2024

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Journal of Osteopathic Medicine Volume 124 Issue 7

Abstract

The emergence of generative large language model (LLM) artificial intelligence (AI) represents one of the most profound developments in healthcare in decades, with the potential to create revolutionary and seismic changes in the practice of medicine as we know it. However, significant concerns have arisen over questions of liability for bad outcomes associated with LLM AI-influenced medical decision making. Although the authors were not able to identify a case in the United States that has been adjudicated on medical malpractice in the context of LLM AI at this time, sufficient precedent exists to interpret how analogous situations might be applied to these cases when they inevitably come to trial in the future. This commentary will discuss areas of potential legal vulnerability for clinicians utilizing LLM AI through review of past case law pertaining to third-party medical guidance and review the patchwork of current regulations relating to medical malpractice liability in AI. Finally, we will propose proactive policy recommendations including creating an enforcement duty at the US Food and Drug Administration (FDA) to require algorithmic transparency, recommend reliance on peer-reviewed data and rigorous validation testing when LLMs are utilized in clinical settings, and encourage tort reform to share liability between physicians and LLM developers.

Keywords: artificial intelligence; ChatGPT; large language models; liability; medical malpractice

Ever since Alan Turing proposed his “Imitation Game” in 1950 [1], the technological world has raced to produce artificial intelligence (AI) that equals or exceeds humans. The newest wave of generative large language model (LLM) AI, such as OpenAI’s ChatGPT, represents a significant leap toward the realization of this dream. The integration of generative LLM AI into clinical settings marks a transformative moment in the history of healthcare, with the potential to revolutionize diagnostics and treatment planning [2, 3]. However, as LLM AI systems begin to play a more prominent role in patient care, uncertainty has arisen regarding liability when AI is involved in medical decision making [4]. “Hallucinations,” a phenomenon in which LLMs create false information to answer a user’s prompt, unknown reliability of source information utilized to train LLMs, and a potential inability for physicians to independently evaluate the accuracy of an LLM AI’s output, are all factors that increase the risk of liability for physicians utilizing these algorithms to make diagnostic and treatment decisions [5].

Although the authors were not able to identify a case in the United States that has been adjudicated on medical malpractice in the context of LLM AI [4, 5], there exists possible legal precedent that can guide our understanding. In this commentary, we will discuss how analogous situations in which physicians have either heeded or disregarded third-party guidance in making medical decisions provide a lens through which we can anticipate future legal interpretations when LLM AI is involved, review current regulations regarding LLM AI, and give recommendations for new health policy to address this issue proactively.

Sources of liability in medical decision making influenced by AI based on historical review

The roots of the legal definition of malpractice are encapsulated in English jurist William Blackstone’s Commentaries on the Laws of England [6], where he characterized it as a “misdemeanor and offense at common law, whether it be for curiosity and experiment, or by neglect; because it breaks the trust which party had placed in his physician, and tends to the patient’s destruction.” Patients retain, and frequently exercise, a cause of action in civil courts for the redress of injuries resulting from medical malpractice. As a general rule, a medical provider engages in malpractice if their conduct deviates from the ordinary standard of care in their jurisdiction, and this can differ widely between jurisdictions [7]. Therefore, a physician’s use of LLM AI in treating a patient will likely be analyzed through the lens of the prevailing standard of care [5]. Although the novelty of LLM AI in clinical practice lends uncertainty as to what sort of evidence the courts will treat as being probative of its proper use, it seems most likely that LLM AI-generated advice will be treated as third-party medical guidance. Fortunately, a practitioner’s reliance on (or disregard for) third-party medical guidance is not a novel topic for the courts.

It is important to note that publicly available LLMs such as Chat-GPT are not exclusively utilizing expert-reviewed data to produce medical guidance. In order to make a more direct comparison to the examples that follow, let us suppose that a future LLM marketed directly to physicians will.

Our first analogous situation involves the degree to which package inserts drafted by pharmaceutical manufacturers can be utilized to establish the standard of care concerning the administration of a drug. Drug package inserts are derived from source information and clinical studies reviewed by physicians; however, they may not be authored by them. In Julian vs. Barker, the Supreme Court of Idaho held that a trial court erred in barring the admission of an information sheet that provided directions on the proper administration of sodium pentothal, as the manufacturer was presumed to be qualified to give directions concerning the use of its product [8]. In coming to this conclusion, the Julian Court noted that a drug manufacturer’s written directions represented prima facie (“facially sufficient” – a legal term meaning that an argument or position is sound enough at first glance to be presumed valid) evidence of the drug’s proper administration. In Mueller vs. Mueller, the Supreme Court of South Dakota upheld a jury instruction that directed the jury to consider evidence that a physician deviated from a drug manufacturer’s instructions on the proper administration of its product as evidence of negligence [9]. The Mueller Court supported its decision by noting that drug manufacturers were increasingly being found liable for defective products, rendering their instructions more reliable. Also, per the Mueller Court, a busy modern physician had no choice but to rely upon a manufacturer’s instructions, as they could not be expected to independently verify the propriety of a drug’s particular application.

Touching upon physicians’ usage of third-party medical guidance even more closely analogous to LLM AI, other cases have probed the extent to which adherence to a point-of-care decision resource can be considered standard of care. In Spensieri vs. Lasky, the Court of Appeals of the State of New York rejected an argument that drug information compiled in the Physician’s Desk Reference was prima facie evidence of the standard of care, noting that a patient’s individual circumstances were vital to that analysis [10]. Ultimately, the Lasky Court held that Physician’s Desk Reference could properly be incorporated into an expert’s testimony but was not standalone proof of the standard of care.

As evidenced by the split in authority discussed above, some jurisdictions may deem an AI utilizing expert-reviewed data to give medical guidance as representing the standard of care, whereas others may generally reject its applicability. A hybrid approach is also possible, in which courts permit the admission of a generative AI’s response to an inquiry by a physician but require supplemental testimony from a qualified medical expert. Uncertainty and doubt that the courts will consider guidance from a validated LLM AI to be the standard of care across the board may produce a future dilemma for the “busy modern physician” when deciding to heed or reject this guidance.

The courts’ collective approach to the interplay between generative AI and the standard of care will likely continue to evolve, as courts frequently modify or clarify established precedent based upon the unique facts of a particular case [9]. In Lhotka vs. Larson, for instance, the Court noted that a jury instruction that deviation from a manufacturer’s directions constituted evidence of negligence would be appropriate only if said directive was clear and unambiguous. However, because the directive was ambiguous, such an instruction was not warranted [11]. As medical guidance given by generative AI approximates the tone of a human consultant, in practice, it is rarely unambiguous.

Brief review of existing laws and regulations applicable to AI

Legislative and regulatory efforts to address LLM AI have been limited and piecemeal so far, with the dynamic nature of these systems posing unique challenges for traditional regulatory approaches [12, 13]. Currently, the closest thing that exists to a comprehensive federal regulation to address liability for LLM AI-influenced medical decisions is a 2022 revision to the Affordable Care Act’s Section 1557, which states that physicians and covered entities are “liable for medical decisions made in reliance with clinical algorithms [14].” This regulation is problematic, because while the intent of this rule is to address and prevent discrimination against historically marginalized communities that can result from AI algorithms [15], reasonable interpretation could apply this statement broadly to any medical decision made utilizing AI, including guidance given by LLMs.

The US Food and Drug Administration (FDA) is beginning to evaluate some AI systems as medical devices [16] and has published nonbinding recommendations for classifying clinical decision support (CDS) software as medical devices or not [17]. These recommendations interpret the scope of section 520(o)(1)I(i) of the Federal Food, Drug, and Cosmetic Act (FD&C Act), and when applied to a LLM, appear to exempt these systems from being classified as ‘devices’ (i.e., a nondevice CDS). This determination rests on the observation that LLMs, when utilized clinically, are designed for decision support and not (at least currently) involved in acquisition or processing of diagnostic images [17]. Importantly, this section also parallels the Lasky court in stating that recommendations given by a CDS tool should be independently considered by the physician in view of the individual patient and not utilized as the sole determination of diagnostic or treatment decisions [17].

In Congress, no significant legislation that specifically regulates LLM AI in healthcare has been proposed. The most potentially consequential bill is the Algorithmic Accountability Act of 2019 [18], which was recently re-introduced in 2023. This legislation aims to create protections for people affected in a negative way by utilization of AI in decisions on housing, credit, education, and other high-impact applications. If passed, it would create an enforcement duty at the Federal Trade Commission to require that automated systems be assessed for biases and hold bad actors accountable [19]. However, this bill does not mention medical malpractice and proposes no plans to assess clinically utilized LLM AI algorithms for reliability.

Discussion, limitations, and policy recommendations

It should be restated that the most significant limitation of this commentary is that to our knowledge, there are no current or previous cases litigated in the United States that specifically address malpractice liability for physicians utilizing LLM AI. As a result, the preceding legal review remains speculative by nature, representing our best guess at what evidence future courts may consider in these cases. Due to the paucity of legislation and regulatory efforts identified by our review in the previous section, we suggest that there is a critical need and opportunity for proactive action to address this issue through policy rather than waiting for resolution through the legal system.

To ensure the reliability of AI systems, protect patients, and promote the fair application of medical malpractice liability, federal policy should mandate rigorous validation and testing of AI tools before their deployment in clinical settings. The US FDA is the preferred agency to regulate clinical AI reliability, given its expertise in medical devices and software, and should extend this responsibility to LLM algorithms. This process could require AI developers to make their algorithms available for independent validation when utilized in clinical practice, ensuring that AI systems provide clear explanations for their recommendations with verified, peer-reviewed data. Finally, if utilization of high-quality LLM AI increasingly becomes considered the standard of care in most jurisdictions, liability reform will be needed to shift some responsibility for AI-generated medical guidance to algorithm developers, as shown relating to drug package inserts discussed previously. Achieving this may require state-level, rather than federal-level, tort reform, and could be an issue to be resolved later by the courts themselves if policy cannot be enacted first.

Conclusions

Until there is clarity or action in determining the scope of malpractice liability as a result of medical decisions influenced by AI, a significant barrier to the adoption and application to the full potential of this technology in medicine will remain. We recommend consideration and adoption of the policy recommendations given in this commentary as a proactive solution to protect patients and reduce the risk of malpractice liability for physicians that choose to take advantage of the potential of generative LLM AI for clinical applications.

Corresponding author: David O. Shumway, DO, Keesler Medical Center, Keesler Air Force Base, 301 Fisher Street, Biloxi, MS 39534, USA, E-mail: doshumway@live.com

Research ethics: Not applicable.
Informed consent: Not applicable.
Author contributions: Both authors provided substantial contributions to conception and design, acquisition of data, or analysis and interpretation of data; both authors drafted the article or revised it critically for important intellectual content; both authors gave final approval of the version of the article to be published; and both authors agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Competing interests: None declared.
Research funding: None declared.
Data availability: Not applicable.
Disclaimer: The views expressed in this material are those of the authors, and do not reflect the official policy or position of the U.S. Government, the Department of Defense, or the Department of the Air Force.

References

1. Turing, AM. Computing machinery and intelligence. Mind 1950;59:433–60.10.1093/mind/LIX.236.433Search in Google Scholar

2. Cutler, DM. What artificial intelligence means for health care. JAMA Health Forum 2023;4:e232652. https://doi.org/10.1001/jamahealthforum.2023.2652.Search in Google Scholar PubMed

3. Ayers, JW, Poliak, A, Dredze, M, Leas, EC, Zhu, Z, Kelley, JB, et al.. Comparing physician and artificial intelligence Chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 2023;183:589–96. https://doi.org/10.1001/jamainternmed.2023.1838.Search in Google Scholar PubMed PubMed Central

4. Price, WN, Gerke, S, Cohen, IG. Potential liability for physicians using artificial intelligence. JAMA 2019;322:1765–6. https://doi.org/10.1001/jama.2019.15064.Search in Google Scholar PubMed

5. Duffourc, M, Gerke, S. Generative AI in health care and liability risks for physicians and safety concerns for patients. JAMA 2023;330:313–14. https://doi.org/10.1001/jama.2023.9630.Search in Google Scholar PubMed

6. Blackstone, W. Commentaries on the laws of England. Boston: Beacon Press; 1962.Search in Google Scholar

7. American Medical Association. State medical liability reform. https://www.ama-assn.org/practice-management/sustainability/state-medical-liability-reform [Accessed 21 Sep 2023].Search in Google Scholar

8. Julian v. Barker, 75 Idaho 413, 423; 1954.10.1016/0010-8545(86)80013-3Search in Google Scholar

9. Mueller v. Mueller, 221 N.W.2d 39, 43 (S.D.); 1974.10.1097/00003246-197401000-00020Search in Google Scholar

10. Spensieri v. Lasky, 94 N.Y.2d 231, 239; 1999.10.1046/j.1360-0443.1999.9422317.xSearch in Google Scholar PubMed

11. Lhotka v. Larson, 307 Minn 121, 126; 1976.10.1016/S0140-6736(76)91407-0Search in Google Scholar

12. Minssen, T, Vayena, E, Cohen, IG. The challenges for regulating medical use of ChatGPT and other large language models. JAMA 2023;330:315–16. https://doi.org/10.1001/jama.2023.9651.Search in Google Scholar PubMed

13. Clark, P, Kim, J, Aphinyanaphongs, Y. Marketing and US Food and Drug Administration clearance of artificial intelligence and machine learning enabled software in and as medical devices: a systematic review. JAMA Netw Open 2023;6:e2321792. https://doi.org/10.1001/jamanetworkopen.2023.21792.Search in Google Scholar PubMed PubMed Central

14. Centers for Medicare and Medicaid Services. Affordable Care Act Section 1557 nondiscrimination in health programs and activities: use of clinical algorithms in decision making (§ 92.210); 2022. https://www.govinfo.gov/content/pkg/FR-2022-08-04/pdf/2022-16217.pdf [Accessed 21 Sep 2023].Search in Google Scholar

15. Parikh, RB, Teeple, S, Navathe, AS. Addressing bias in artificial intelligence in health care. JAMA 2019;322:2377–8. https://doi.org/10.1001/jama.2019.18058.Search in Google Scholar PubMed

16. Clark, P, Kim, J, Aphinyanaphongs, Y. Marketing and US Food and Drug Administration clearance of artificial intelligence and machine learning enabled software in and as medical devices: a systematic review. JAMA Netw Open 2023;6:e2321792. https://doi.org/10.1001/jamanetworkopen.2023.21792.Search in Google Scholar

17. U.S. Food and Drug Administration. “Clinical Decision Support Software” guidance Document, FDA-2017-D-6569; 2022. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/clinical-decision-support-software [Accessed 21 Sep 2023].Search in Google Scholar

18. S. 1108 – algorithmic accountability Act of 2019. https://www.congress.gov/bill/116th-congress/senate-bill/1108 [Accessed 22 Sep 2023].Search in Google Scholar

19. Wyden, Booker and Clarke introduce bill to regulate use of artificial intelligence to make critical decisions like housing, employment and education. U.S. Senator Ron Wyden of Oregon. www.wyden.senate.gov, https://www.wyden.senate.gov/news/press-releases/wyden-booker-and-clarke-introduce-bill-to-regulate-use-of-artificial-intelligence-to-make-critical-decisions-like-housing-employment-and-education [Accessed 26 Sep 2023].Search in Google Scholar

Received: 2023-10-11

Accepted: 2024-01-03

Published Online: 2024-01-31

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

Frontmatter
Innovations
Commentary
Medical malpractice liability in large language model artificial intelligence: legal review and policy recommendations
Medical Education
Original Articles
Examining differences in trends in the orthopedic surgery match for osteopathic and allopathic medical graduates after the transition to single accreditation
DO seniors and IMGs have lower match probabilities than MD seniors after adjusting for specialty choice and USMLE Step 1 score
Musculoskeletal Medicine and Pain
Review Article
Use of person-centered language in obesity-related publications across sports medicine journals: a systematic review of adherence to person-centered language guidelines in sports medicine
Neuromusculoskeletal Medicine (OMT)
Original Articles
The short- and long-term effect of osteopathic manipulative treatment on pain, and psychosocial factors in adults with chronic low back pain
Interoceptive bodily awareness in patients seeking pain relief with osteopathic manipulative treatment: an observational cohort pilot study
Clinical Image
A masquerading presentation of dermatofibrosarcoma protuberans

https://doi.org/10.1515/jom-2023-0229

Keywords for this article

artificial intelligence; ChatGPT; large language models; liability; medical malpractice

Creative Commons

BY 4.0

Articles in the same Issue

Frontmatter
Innovations
Commentary
Medical malpractice liability in large language model artificial intelligence: legal review and policy recommendations
Medical Education
Original Articles
Examining differences in trends in the orthopedic surgery match for osteopathic and allopathic medical graduates after the transition to single accreditation
DO seniors and IMGs have lower match probabilities than MD seniors after adjusting for specialty choice and USMLE Step 1 score
Musculoskeletal Medicine and Pain
Review Article
Use of person-centered language in obesity-related publications across sports medicine journals: a systematic review of adherence to person-centered language guidelines in sports medicine
Neuromusculoskeletal Medicine (OMT)
Original Articles
The short- and long-term effect of osteopathic manipulative treatment on pain, and psychosocial factors in adults with chronic low back pain
Interoceptive bodily awareness in patients seeking pain relief with osteopathic manipulative treatment: an observational cohort pilot study
Clinical Image
A masquerading presentation of dermatofibrosarcoma protuberans