Overview
While ChatGPT offers significant potential benefits for healthcare, the technology also presents substantial risks that demand careful consideration before implementation. The systematic review by Sallam (2023) identified 19 distinct concerns spanning ethical, accuracy, and implementation domains. Of the 60 records analyzed, 17 (28.3%) focused primarily on limitations and risks—a proportion that reflects the medical community's traditional emphasis on "first, do no harm." This means that while benefits dominated early literature, approximately one in three scholarly publications raised significant cautions. Unlike consumer applications where AI errors may cause inconvenience, healthcare applications operate in contexts where mistakes can have life-altering consequences. In other words, the stakes in healthcare AI are fundamentally higher than in general-purpose applications.
The concerns identified in this review are not merely theoretical. Research has demonstrated that hallucination rates in medical contexts range from 3% to 27% depending on task complexity and domain, with specialized medical questions showing higher error rates than general knowledge queries. Multiple documented cases demonstrate the real-world manifestation of these risks: studies found that ChatGPT generated fabricated medical references in approximately 18% of literature review requests, provided incorrect drug dosage recommendations in 12% of tested scenarios, and produced medical advice contradicted by established clinical guidelines in 8-15% of cases. Compared to rule-based clinical decision support systems, which produce deterministic and auditable outputs, ChatGPT's generative nature introduces unpredictable failure modes. Consequently, these failures are particularly problematic because ChatGPT's confident, authoritative tone can make inaccurate information difficult to distinguish from correct guidance, especially for users without domain expertise.
The regulatory response has been substantial: according to the WHO, over 50 countries are developing AI healthcare regulations as of 2024, while the FDA has authorized over 690 AI/ML-enabled medical devices through 2024. However, LLM-based tools like ChatGPT present novel challenges that existing frameworks weren't designed to address—specifically, the non-deterministic nature of outputs and the difficulty of establishing validation standards for open-domain language generation. On the other hand, regulatory attention is driving development of more robust evaluation methodologies and governance frameworks.
Understanding these limitations is essential for developing appropriate governance frameworks. Healthcare institutions implementing ChatGPT must establish clear policies addressing disclosure requirements, verification procedures, and accountability structures. For a balanced perspective on potential applications, see the Benefits & Applications page. For implementation guidance, consult the Research Teams working on AI safety in healthcare.
Methodology
The systematic review by Sallam, M. [Scholar] followed PRISMA guidelines to systematically identify concerns about ChatGPT in healthcare. The search strategy encompassed five databases: PubMed, Google Scholar, Scopus, medRxiv, and the Science Media Centre. Of 60 total records analyzed, 17 (28.3%) focused primarily on limitations and concerns—a proportion that reflects the cautious approach traditional in medical research where "first, do no harm" remains paramount.
The concern categories identified emerged through thematic analysis of the literature. Researchers like Topol, E. [Scholar] at Scripps Research and Lee, P. [Scholar] at Microsoft Research have continued to refine these categories in subsequent work. The methodology notably captures early expert consensus—a valuable baseline against which to measure how concerns have evolved as the technology matures. See Top Journals for where this evolving literature appears.
Ethical Concerns
The ethical implications of ChatGPT in healthcare extend beyond simple questions of accuracy to fundamental issues of professional responsibility, patient autonomy, and scientific integrity. These concerns reflect tensions that have always existed in medicine—between efficiency and thoroughness, between technological capability and human judgment—but are amplified by the scale and opacity of AI systems. The research teams working on these issues bring expertise from bioethics, medical informatics, and health policy.
Perhaps the most fundamental ethical question is accountability: when an AI system contributes to a clinical decision that harms a patient, who bears responsibility? The physician who relied on the AI? The institution that deployed it? The company that developed it? Current legal frameworks, designed for a world where human professionals make clinical decisions, offer limited guidance for AI-augmented care. This ambiguity creates both liability risk for healthcare organizations and uncertainty for patients about their recourse when things go wrong.
| Concern | Description | Implications |
|---|---|---|
| Authorship Attribution | Unclear whether AI-generated content qualifies for authorship credit | Major journals now require disclosure of AI assistance; ChatGPT cannot be listed as author |
| Transparency | Lack of clear guidelines on disclosing AI use in manuscripts and clinical notes | Risk of undisclosed AI assistance undermining scientific integrity |
| Informed Consent | Patients may not be aware when AI is used in their care | Need for new consent frameworks addressing AI-assisted care |
| Accountability | Unclear responsibility when AI-generated content causes harm | Legal and regulatory frameworks lag behind technology adoption |
| Equity | Differential access to AI tools may exacerbate existing disparities | Some researchers/institutions have advantages over others |
Key Ethical Dilemma
The fundamental question of whether AI-generated text should be considered "original work" remains unresolved. This has implications for academic publishing, grant applications, and clinical documentation standards.
Accuracy Concerns
ChatGPT's accuracy limitations pose significant risks in healthcare applications, potentially negating the benefits discussed elsewhere in this portal. The core issue is that LLMs are fundamentally pattern-matching systems trained on internet text—they have no mechanism for distinguishing true medical facts from plausible-sounding fiction. When a model generates a drug dosage or a differential diagnosis, it's drawing from statistical patterns in its training data, not from a verified medical knowledge base.
Two accuracy problems are particularly concerning. Hallucination occurs when the model generates entirely fabricated information with the same confident tone it uses for accurate information—creating references that don't exist, citing studies that were never conducted, or recommending treatments that contradict clinical guidelines. Outdated information arises because models are trained on data with a cutoff date; guidelines may have changed, new drugs may have been approved, or safety warnings may have been issued since training. Both problems are explored in detail by the leading research teams studying AI safety:
Hallucination
ChatGPT can generate plausible-sounding but entirely fabricated information, including:
- Non-existent research papers with fake DOIs
- Fictional author names and institutions
- Made-up statistics and study results
- Incorrect drug dosages or interactions
Outdated Information
Training data cutoffs mean ChatGPT may not reflect current medical knowledge:
- New drug approvals not included
- Updated treatment guidelines unknown
- Recent research findings absent
- Emerging disease information missing
| Concern | Description | Risk Level |
|---|---|---|
| Medical Misinformation | Incorrect health advice that could harm patients | High |
| Fabricated References | Citation of non-existent papers undermines research integrity | High |
| Incorrect Clinical Guidance | Wrong treatment recommendations or dosing information | Critical |
| Outdated Guidelines | Recommendations based on superseded clinical standards | Medium-High |
| Statistical Errors | Incorrect interpretation of medical statistics | Medium |
Bias Concerns
ChatGPT may perpetuate or amplify biases present in its training data—a problem with profound implications for healthcare equity. Because LLMs learn from internet text, they inherit the biases embedded in that text: underrepresentation of certain populations in medical literature, historical discrimination in clinical guidelines, and language patterns that reflect societal prejudices. When these models are deployed in healthcare, they risk systematically disadvantaging already-marginalized groups.
Researchers at Stanford Medicine (Shah, N. [Scholar]) and Harvard have documented specific instances of algorithmic bias in healthcare AI. The Research Teams section provides details on groups working on bias mitigation. Key bias categories include:
| Type of Bias | Description | Healthcare Impact |
|---|---|---|
| Demographic Bias | Training data may underrepresent certain populations | Recommendations may not apply equally to all patient groups |
| Geographic Bias | Predominantly English-language, Western-centric training | May not reflect global healthcare practices or needs |
| Historical Bias | Past inequities in medical research reflected in outputs | Risk of perpetuating discrimination in care recommendations |
| Publication Bias | Training overweighted toward published (positive) results | May overstate efficacy of treatments |
Bias Amplification Risk
When ChatGPT is used to generate content that is then published, it may create feedback loops where biased outputs become new training data for future models, amplifying initial biases over time.
Plagiarism & Academic Integrity
The use of ChatGPT raises significant concerns about academic and scientific integrity—concerns that affect medical education as profoundly as they affect clinical practice. When students use AI to complete assignments or exams, they may graduate without developing the critical thinking skills that are essential for safe clinical practice. Similarly, when researchers use AI to generate grant proposals or manuscripts without disclosure, they undermine the trust that underlies scientific publishing.
Major journals including JAMA, NEJM, and Lancet have issued policies requiring disclosure of AI assistance. Educational institutions are developing new assessment methods that are more robust to AI assistance. The challenges below represent key integrity concerns:
| Concern | Description | Detection Challenge |
|---|---|---|
| Undisclosed AI Use | Submitting AI-generated content as original work | Current detection tools have limited accuracy |
| Exam Cheating | Using ChatGPT to complete assessments dishonestly | Particularly challenging for take-home exams |
| Grant Applications | AI-generated proposals may not reflect true researcher capabilities | Threatens fair allocation of research funding |
| Credential Inflation | AI assistance may inflate apparent productivity and skills | Impacts hiring and promotion decisions |
Emerging Responses
Educational institutions and publishers are developing policies requiring AI use disclosure. Some journals now mandate statements about AI assistance in all submissions, while universities are updating academic integrity policies.
Privacy & Security Concerns
Using ChatGPT in healthcare contexts raises fundamental data protection and privacy issues that intersect with decades of healthcare privacy regulation. Unlike consumer applications where data sharing may be inconvenient, healthcare data carries unique sensitivity—genetic information, mental health records, and infectious disease status can affect employment, insurance, and social relationships. When clinicians or patients enter protected health information (PHI) into ChatGPT, they may inadvertently expose this data to third parties in ways that violate HIPAA, GDPR, or other regulatory frameworks.
The privacy challenges are compounded by the opacity of AI systems. When data is used for model training, it can potentially resurface in model outputs in unpredictable ways—a phenomenon distinct from traditional database breaches. The journals publishing on healthcare AI increasingly require privacy impact assessments, and the research teams at institutions like Oxford and Harvard are developing frameworks for privacy-preserving AI deployment. Key concerns include:
Data Privacy Risks
- Patient information entered into ChatGPT may be retained
- HIPAA compliance uncertain when using third-party AI
- Data may be used for model training without consent
- Cross-border data transfer concerns for international use
Security Concerns
- Vulnerability to prompt injection attacks
- Potential for social engineering assistance
- Confidential information disclosure risks
- Lack of audit trails for compliance purposes
Implementation Challenges
Beyond the fundamental concerns about accuracy, bias, and ethics, healthcare organizations face significant practical barriers to appropriate ChatGPT adoption. These implementation challenges often determine whether theoretical benefits can be realized in real-world clinical settings. Even when an AI tool performs well in controlled evaluations, integrating it into existing workflows, training staff to use it effectively, and establishing governance structures requires substantial organizational investment.
The challenges below represent systemic barriers that must be addressed alongside the technical improvements discussed in Benefits & Applications. Many of these challenges are not unique to ChatGPT but affect all healthcare AI implementations, as documented by the research teams working on clinical AI deployment:
| Challenge | Description | Mitigation Strategies |
|---|---|---|
| Dependency | Over-reliance may erode critical thinking skills | Education on appropriate use; maintaining core competencies |
| Verification Burden | All outputs require human verification | Clear workflows for fact-checking; training on verification |
| Inconsistency | Same prompt may yield different outputs over time | Version control; documentation of prompts used |
| Integration | Difficulty integrating with existing clinical workflows | API development; EHR integration efforts |
| Training Needs | Healthcare workers need education on AI limitations | Continuing education programs; institutional policies |
Mitigation Strategies
Recommendations for addressing identified concerns:
Best Practices for Responsible Use
- Always verify: Independently confirm all factual claims, references, and medical information
- Disclose use: Transparently acknowledge AI assistance in publications and documentation
- Protect privacy: Never input patient-identifiable information into ChatGPT
- Maintain expertise: Use AI as a supplement to, not replacement for, professional judgment
- Stay current: Recognize training cutoff limitations and verify with current sources
- Document prompts: Keep records of prompts used for reproducibility
- Institutional policies: Follow organizational guidelines on AI use
Summary
| Category | Number of Concerns | Severity |
|---|---|---|
| Ethical Issues | 5 | High |
| Accuracy Problems | 5 | Critical |
| Bias Concerns | 4 | High |
| Academic Integrity | 4 | High |
| Privacy/Security | 4 | High |
| Implementation | 5 | Medium |
| Total | 19* | - |
*Some concerns overlap across categories
Return to the main portal or see Benefits & Applications for the positive perspective.
Recent Developments (2024-2025)
Since the original systematic review, research on AI limitations in healthcare has expanded significantly, with increasing focus on empirical measurement of harms and development of mitigation strategies:
- npj Digital Medicine (2023): Comprehensive review identifying key limitation categories across clinical applications
- Nature Medicine (2023): Even high-performing Med-PaLM 2 showed significant limitations in certain clinical reasoning tasks
- Healthcare MDPI (2023): Source systematic review identifying 19 distinct concern categories across 60 analyzed records
These studies underscore that the concerns identified in the original 2023 review remain highly relevant. The research teams working on AI safety continue to identify new challenges as clinical deployment expands. For the publication venues where this safety research appears, see Top Journals.
Leading Research Teams
Key institutions researching AI safety, ethics, and limitations in healthcare, with principal investigators specializing in responsible AI deployment:
- Harvard Medical School - Kohane, I. [Scholar], Beam, A. [Scholar] (AI ethics, bias detection)
- Stanford Medicine - Shah, N. [Scholar], Rajpurkar, P. [Scholar] (AI fairness, evaluation)
- University of Oxford - Goldacre, B. [Scholar] (health data transparency, reproducibility)
- Beth Israel Deaconess - Bates, D. [Scholar] (patient safety, clinical decision support)
- University of Jordan - Sallam, M. [Scholar] (systematic review author)
See Research Teams for complete institution list. For the potential benefits that must be weighed against these concerns, see the dedicated page.
Key Journals
Primary publication venues for AI safety and ethics in healthcare:
- Healthcare (MDPI) - Source of the systematic review
- JAMA - AI ethics viewpoints and policy
- BMJ - Clinical AI safety and regulation
See Top Journals for complete journal list.
External Resources
Authoritative sources on AI safety and governance in healthcare:
Regulatory Frameworks
- WHO AI Ethics Guidelines - Global health AI governance
- UNESCO AI Ethics - International ethical standards
- FDA AI/ML Framework - US regulatory approach
Research & Standards
- Stanford HAI - Human-centered AI research
- NIH AI Initiative - Federal AI safety research
- ACM Code of Ethics - Computing professional standards
See Also
- Portal Overview - Main portal with field summary
- Benefits & Applications - 22 identified benefits to balance against concerns
- Research Teams - Leading institutions and researchers
- Top Journals - Key publication venues