Overview

While ChatGPT offers significant potential benefits for healthcare, the technology also presents substantial risks that demand careful consideration before implementation. The systematic review by Sallam (2023) identified 19 distinct concerns spanning ethical, accuracy, and implementation domains. Of the 60 records analyzed, 17 (28.3%) focused primarily on limitations and risks—a proportion that reflects the medical community's traditional emphasis on "first, do no harm." This means that while benefits dominated early literature, approximately one in three scholarly publications raised significant cautions. Unlike consumer applications where AI errors may cause inconvenience, healthcare applications operate in contexts where mistakes can have life-altering consequences. In other words, the stakes in healthcare AI are fundamentally higher than in general-purpose applications.

The concerns identified in this review are not merely theoretical. Research has demonstrated that hallucination rates in medical contexts range from 3% to 27% depending on task complexity and domain, with specialized medical questions showing higher error rates than general knowledge queries. Multiple documented cases demonstrate the real-world manifestation of these risks: studies found that ChatGPT generated fabricated medical references in approximately 18% of literature review requests, provided incorrect drug dosage recommendations in 12% of tested scenarios, and produced medical advice contradicted by established clinical guidelines in 8-15% of cases. Compared to rule-based clinical decision support systems, which produce deterministic and auditable outputs, ChatGPT's generative nature introduces unpredictable failure modes. Consequently, these failures are particularly problematic because ChatGPT's confident, authoritative tone can make inaccurate information difficult to distinguish from correct guidance, especially for users without domain expertise.

The regulatory response has been substantial: according to the WHO, over 50 countries are developing AI healthcare regulations as of 2024, while the FDA has authorized over 690 AI/ML-enabled medical devices through 2024. However, LLM-based tools like ChatGPT present novel challenges that existing frameworks weren't designed to address—specifically, the non-deterministic nature of outputs and the difficulty of establishing validation standards for open-domain language generation. On the other hand, regulatory attention is driving development of more robust evaluation methodologies and governance frameworks.

Understanding these limitations is essential for developing appropriate governance frameworks. Healthcare institutions implementing ChatGPT must establish clear policies addressing disclosure requirements, verification procedures, and accountability structures. For a balanced perspective on potential applications, see the Benefits & Applications page. For implementation guidance, consult the Research Teams working on AI safety in healthcare.

Methodology

The systematic review by Sallam, M. [Scholar] followed PRISMA guidelines to systematically identify concerns about ChatGPT in healthcare. The search strategy encompassed five databases: PubMed, Google Scholar, Scopus, medRxiv, and the Science Media Centre. Of 60 total records analyzed, 17 (28.3%) focused primarily on limitations and concerns—a proportion that reflects the cautious approach traditional in medical research where "first, do no harm" remains paramount.

The concern categories identified emerged through thematic analysis of the literature. Researchers like Topol, E. [Scholar] at Scripps Research and Lee, P. [Scholar] at Microsoft Research have continued to refine these categories in subsequent work. The methodology notably captures early expert consensus—a valuable baseline against which to measure how concerns have evolved as the technology matures. See Top Journals for where this evolving literature appears.

Ethical Concerns

The ethical implications of ChatGPT in healthcare extend beyond simple questions of accuracy to fundamental issues of professional responsibility, patient autonomy, and scientific integrity. These concerns reflect tensions that have always existed in medicine—between efficiency and thoroughness, between technological capability and human judgment—but are amplified by the scale and opacity of AI systems. The research teams working on these issues bring expertise from bioethics, medical informatics, and health policy.

Perhaps the most fundamental ethical question is accountability: when an AI system contributes to a clinical decision that harms a patient, who bears responsibility? The physician who relied on the AI? The institution that deployed it? The company that developed it? Current legal frameworks, designed for a world where human professionals make clinical decisions, offer limited guidance for AI-augmented care. This ambiguity creates both liability risk for healthcare organizations and uncertainty for patients about their recourse when things go wrong.

Concern Description Implications
Authorship Attribution Unclear whether AI-generated content qualifies for authorship credit Major journals now require disclosure of AI assistance; ChatGPT cannot be listed as author
Transparency Lack of clear guidelines on disclosing AI use in manuscripts and clinical notes Risk of undisclosed AI assistance undermining scientific integrity
Informed Consent Patients may not be aware when AI is used in their care Need for new consent frameworks addressing AI-assisted care
Accountability Unclear responsibility when AI-generated content causes harm Legal and regulatory frameworks lag behind technology adoption
Equity Differential access to AI tools may exacerbate existing disparities Some researchers/institutions have advantages over others

Key Ethical Dilemma

The fundamental question of whether AI-generated text should be considered "original work" remains unresolved. This has implications for academic publishing, grant applications, and clinical documentation standards.

Accuracy Concerns

ChatGPT's accuracy limitations pose significant risks in healthcare applications, potentially negating the benefits discussed elsewhere in this portal. The core issue is that LLMs are fundamentally pattern-matching systems trained on internet text—they have no mechanism for distinguishing true medical facts from plausible-sounding fiction. When a model generates a drug dosage or a differential diagnosis, it's drawing from statistical patterns in its training data, not from a verified medical knowledge base.

Two accuracy problems are particularly concerning. Hallucination occurs when the model generates entirely fabricated information with the same confident tone it uses for accurate information—creating references that don't exist, citing studies that were never conducted, or recommending treatments that contradict clinical guidelines. Outdated information arises because models are trained on data with a cutoff date; guidelines may have changed, new drugs may have been approved, or safety warnings may have been issued since training. Both problems are explored in detail by the leading research teams studying AI safety:

Hallucination

ChatGPT can generate plausible-sounding but entirely fabricated information, including:

  • Non-existent research papers with fake DOIs
  • Fictional author names and institutions
  • Made-up statistics and study results
  • Incorrect drug dosages or interactions

Outdated Information

Training data cutoffs mean ChatGPT may not reflect current medical knowledge:

  • New drug approvals not included
  • Updated treatment guidelines unknown
  • Recent research findings absent
  • Emerging disease information missing
Concern Description Risk Level
Medical Misinformation Incorrect health advice that could harm patients High
Fabricated References Citation of non-existent papers undermines research integrity High
Incorrect Clinical Guidance Wrong treatment recommendations or dosing information Critical
Outdated Guidelines Recommendations based on superseded clinical standards Medium-High
Statistical Errors Incorrect interpretation of medical statistics Medium

Bias Concerns

ChatGPT may perpetuate or amplify biases present in its training data—a problem with profound implications for healthcare equity. Because LLMs learn from internet text, they inherit the biases embedded in that text: underrepresentation of certain populations in medical literature, historical discrimination in clinical guidelines, and language patterns that reflect societal prejudices. When these models are deployed in healthcare, they risk systematically disadvantaging already-marginalized groups.

Researchers at Stanford Medicine (Shah, N. [Scholar]) and Harvard have documented specific instances of algorithmic bias in healthcare AI. The Research Teams section provides details on groups working on bias mitigation. Key bias categories include:

Type of Bias Description Healthcare Impact
Demographic Bias Training data may underrepresent certain populations Recommendations may not apply equally to all patient groups
Geographic Bias Predominantly English-language, Western-centric training May not reflect global healthcare practices or needs
Historical Bias Past inequities in medical research reflected in outputs Risk of perpetuating discrimination in care recommendations
Publication Bias Training overweighted toward published (positive) results May overstate efficacy of treatments

Bias Amplification Risk

When ChatGPT is used to generate content that is then published, it may create feedback loops where biased outputs become new training data for future models, amplifying initial biases over time.

Plagiarism & Academic Integrity

The use of ChatGPT raises significant concerns about academic and scientific integrity—concerns that affect medical education as profoundly as they affect clinical practice. When students use AI to complete assignments or exams, they may graduate without developing the critical thinking skills that are essential for safe clinical practice. Similarly, when researchers use AI to generate grant proposals or manuscripts without disclosure, they undermine the trust that underlies scientific publishing.

Major journals including JAMA, NEJM, and Lancet have issued policies requiring disclosure of AI assistance. Educational institutions are developing new assessment methods that are more robust to AI assistance. The challenges below represent key integrity concerns:

Concern Description Detection Challenge
Undisclosed AI Use Submitting AI-generated content as original work Current detection tools have limited accuracy
Exam Cheating Using ChatGPT to complete assessments dishonestly Particularly challenging for take-home exams
Grant Applications AI-generated proposals may not reflect true researcher capabilities Threatens fair allocation of research funding
Credential Inflation AI assistance may inflate apparent productivity and skills Impacts hiring and promotion decisions

Emerging Responses

Educational institutions and publishers are developing policies requiring AI use disclosure. Some journals now mandate statements about AI assistance in all submissions, while universities are updating academic integrity policies.

Privacy & Security Concerns

Using ChatGPT in healthcare contexts raises fundamental data protection and privacy issues that intersect with decades of healthcare privacy regulation. Unlike consumer applications where data sharing may be inconvenient, healthcare data carries unique sensitivity—genetic information, mental health records, and infectious disease status can affect employment, insurance, and social relationships. When clinicians or patients enter protected health information (PHI) into ChatGPT, they may inadvertently expose this data to third parties in ways that violate HIPAA, GDPR, or other regulatory frameworks.

The privacy challenges are compounded by the opacity of AI systems. When data is used for model training, it can potentially resurface in model outputs in unpredictable ways—a phenomenon distinct from traditional database breaches. The journals publishing on healthcare AI increasingly require privacy impact assessments, and the research teams at institutions like Oxford and Harvard are developing frameworks for privacy-preserving AI deployment. Key concerns include:

Data Privacy Risks

  • Patient information entered into ChatGPT may be retained
  • HIPAA compliance uncertain when using third-party AI
  • Data may be used for model training without consent
  • Cross-border data transfer concerns for international use

Security Concerns

  • Vulnerability to prompt injection attacks
  • Potential for social engineering assistance
  • Confidential information disclosure risks
  • Lack of audit trails for compliance purposes

Implementation Challenges

Beyond the fundamental concerns about accuracy, bias, and ethics, healthcare organizations face significant practical barriers to appropriate ChatGPT adoption. These implementation challenges often determine whether theoretical benefits can be realized in real-world clinical settings. Even when an AI tool performs well in controlled evaluations, integrating it into existing workflows, training staff to use it effectively, and establishing governance structures requires substantial organizational investment.

The challenges below represent systemic barriers that must be addressed alongside the technical improvements discussed in Benefits & Applications. Many of these challenges are not unique to ChatGPT but affect all healthcare AI implementations, as documented by the research teams working on clinical AI deployment:

Challenge Description Mitigation Strategies
Dependency Over-reliance may erode critical thinking skills Education on appropriate use; maintaining core competencies
Verification Burden All outputs require human verification Clear workflows for fact-checking; training on verification
Inconsistency Same prompt may yield different outputs over time Version control; documentation of prompts used
Integration Difficulty integrating with existing clinical workflows API development; EHR integration efforts
Training Needs Healthcare workers need education on AI limitations Continuing education programs; institutional policies

Mitigation Strategies

Recommendations for addressing identified concerns:

Best Practices for Responsible Use

  1. Always verify: Independently confirm all factual claims, references, and medical information
  2. Disclose use: Transparently acknowledge AI assistance in publications and documentation
  3. Protect privacy: Never input patient-identifiable information into ChatGPT
  4. Maintain expertise: Use AI as a supplement to, not replacement for, professional judgment
  5. Stay current: Recognize training cutoff limitations and verify with current sources
  6. Document prompts: Keep records of prompts used for reproducibility
  7. Institutional policies: Follow organizational guidelines on AI use

Summary

Category Number of Concerns Severity
Ethical Issues 5 High
Accuracy Problems 5 Critical
Bias Concerns 4 High
Academic Integrity 4 High
Privacy/Security 4 High
Implementation 5 Medium
Total 19* -

*Some concerns overlap across categories

Return to the main portal or see Benefits & Applications for the positive perspective.

Recent Developments (2024-2025)

Since the original systematic review, research on AI limitations in healthcare has expanded significantly, with increasing focus on empirical measurement of harms and development of mitigation strategies:

These studies underscore that the concerns identified in the original 2023 review remain highly relevant. The research teams working on AI safety continue to identify new challenges as clinical deployment expands. For the publication venues where this safety research appears, see Top Journals.

Leading Research Teams

Key institutions researching AI safety, ethics, and limitations in healthcare, with principal investigators specializing in responsible AI deployment:

See Research Teams for complete institution list. For the potential benefits that must be weighed against these concerns, see the dedicated page.

Key Journals

Primary publication venues for AI safety and ethics in healthcare:

See Top Journals for complete journal list.

External Resources

Authoritative sources on AI safety and governance in healthcare:

Regulatory Frameworks

Research & Standards

See Also