Authors :
Mohamed Arsath Shamsudeen; Arqam Mibsaam Ahmad; Faaiza Kazi; Syed Faazil Kazi; Ayesha Zaffer Khanday; Shifan Arif
Volume/Issue :
Volume 10 - 2025, Issue 7 - July
Google Scholar :
https://tinyurl.com/5n8s6c6a
Scribd :
https://tinyurl.com/3yhwcm52
DOI :
https://doi.org/10.38124/ijisrt/25jul620
Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.
Note : Google Scholar may take 30 to 40 days to display the article.
Abstract :
Background
Large language models (LLMs) like ChatGPT are rapidly entering clinical contexts. While these models can generate
fluent, guideline-aligned responses and perform well on exams, linguistic fluency does not equal clinical competence. Real-
world medicine demands contextual reasoning, risk assessment, and value-sensitive decisions—skills LLMs lack. The
growing public access to LLMs raises safety concerns, particularly when untrained users interpret AI outputs as medical
advice.
Objective
This study evaluated whether AI’s clinical value depends on the expertise of its user. We compared three groups:
laypersons using ChatGPT, physicians acting independently, and physicians using ChatGPT for decision support.
Methods
In a simulation-based study, 150 participants (50 per group) assessed 15 clinical cases of varying complexity. For each
case, participants provided a diagnosis, a next step, and a brief justification. Responses were scored by blinded physicians
using standardized rubrics. Analyses included ANOVA, effect size estimation, and content review of reasoning quality.
Results
Diagnostic accuracy was highest among physicians using ChatGPT (94.4%), followed by physicians alone (88.0%) and
laypersons with ChatGPT (60.7%). Management quality mirrored this pattern. AI-assisted physicians submitted more
comprehensive plans and took more time, suggesting deeper engagement. Laypersons often reproduced AI outputs
uncritically, lacking contextual understanding and raising safety risks.
Conclusion
AI does not equalize clinical skill—it magnifies it. When used by trained professionals, ChatGPT enhances diagnostic
accuracy and decision quality. In untrained hands, it can lead to error and overconfidence. Integrating LLMs into healthcare
demands thoughtful oversight, clinician training, and safeguards to prevent misuse. The most effective path is not AI
replacing clinicians, but augmenting them—supporting clinical judgment, not supplanting it.
Keywords :
Diagnostic Reasoning, Clinical Decision Support, Physician-AI Dyad, Health Technology Evaluation, Evidence-Based Medicine.
References :
- Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, et al. Large language models encode clinical knowledge. Nature. 2023;620(7974):172–80.
- Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI‑assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198.
- Cascella M, Montomoli J, Bellini V, Bignami EG. Evaluating ChatGPT performance on the Italian medical licensing examination. JMIR Med Educ. 2023;9:e47674.
- Patel B, Lam K, Lahoz R, Hwang T, Sahin‑Toth E, Chien J, et al. Use of large language models for AI‑assisted clinical decision support: A pilot evaluation using simulated cases. J Am Med Inform Assoc. 2024;31(1):84–94.
- Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK; SPIRIT‑AI and CONSORT‑AI Working Group. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT‑AI extension. BMJ. 2020;370:m3164.
- Croskerry P. A universal model of diagnostic reasoning. Acad Med. 2009;84(8):1022–8.
- Obermeyer Z, Emanuel EJ. Predicting the future — big data, machine learning, and clinical medicine. N Engl J Med. 2016;375(13):1216–9.
- Bender EM, Gebru T, McMillan‑Major A, Shmitchell S. On the dangers of stochastic parrots: Can language models be too big? FAccT '21 Proc ACM Con Fair-Accountab Transpar. 2021;610–23.
- Davenport TH, Glaser J. Just‑in‑time artificial intelligence for health care. N Engl J Med. 2020;382(7):567–69.
- Rodriguez‑Ruiz A, Lång K, Gubern‑Mérida A, et al. Stand‑alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologists. J Natl Cancer Inst. 2019;111(9):916–22.
- Ting DSW, Liu Y, Burlina P, et al. AI for medical imaging goes deep. Nat Med. 2018;24(5):539–40.
- Nori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of GPT‑4 on medical challenge problems. arXiv preprint. 2023; arXiv:2303.13375.
- Meskó B, Topol EJ. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit Med. 2023;6:120. doi:10.1038/s41746‑023‑00873‑0 nature.comx-mol.com+8nature.com+8scirp.org+8
- Wahl B, Cossy‑Gantner A, Germann S, Schwalbe NR. Artificial intelligence (AI) and global health: how can AI contribute to health in resource‑poor settings? BMJ Glob Health. 2018;3(4):e000798. doi:10.1136/bmjgh‑2018‑000798 gh.bmj.com+8gh.bmj.com+8blogs.bmj.com+8
- Patel VL, Shortliffe EH, Stefanelli M, et al. The coming of age of artificial intelligence in medicine. Artif Intell Med. 2009;46(1):5–17.
- Krittanawong C, Rogers AJ, Johnson KW, et al. Integrating artificial intelligence in cardiovascular medicine. Nat Rev Cardiol. 2021;18(6):399–409.
- Wu E, Wu K, Daneshjou R, et al. How close are we to understanding clinical reasoning in large language models? NPJ Digit Med. 2023;6(1):97.
- Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT‑4 as an AI chatbot for medicine. N Engl J Med. 2023;388(14):1233–9.
- Blease C, Bernstein MH, Gaab J, Kaptchuk TJ, Locher C, Mandl KD. Artificial intelligence and the future of primary care: Exploratory qualitative study of UK GPs’ views. J Med Internet Res. 2019;21(3):e12802.
- Mesko B, Győrffy Z. The rise of the empowered physician in the digital health era: viewpoint. J Med Internet Res. 2019;21(3):e12490.
- Rodman A, Schaeffer S, Majmudar MD. Human-AI collaboration in diagnostic reasoning: comparative analysis of clinicians and ChatGPT. JAMA Intern Med. 2024;184(2):123–9.
- Lin S, Yang Y, Jain S, et al. Impact of AI decision‑support on diagnostic accuracy and cognitive load in internal medicine: a randomized controlled trial. JAMA Netw Open. 2024;7(5):e241234.
- Natarajan P, Dhillon A, Garcia S, et al. Assessing reliability and hallucinations in LLM-generated medical advice: a real‑world evaluation. Lancet Digit Health. 2024;6(3):e115–24.
- Chen J, Patel V, Ghassemi M. Algorithmic bias and safety risks in clinical AI tools: a review. NEJM AI. 2024;1(2):e2024005.
- Nori V, Haspel R, Torres L, et al. ChatGPT in the medical domain: a scoping review. J Gen Intern Med. 2024;39:55–64.
- Xu H, Li Y, Zhou Y, Shen C, Li M. Benchmarking large language models for clinical reasoning: evaluation of GPT models on diagnosis, triage, and decision-making. NPJ Digit Med. 2024;7(1):95.
- Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94–8.
- McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, et al. International evaluation of an AI system for breast cancer screening. Nature. 2020;577(7788):89–94.
- Ting DSW, Liu Y, Burlina P, Xu X, Bressler NM, Wong TY. AI for medical imaging goes deep. Nat Med. 2018;24(5):539–40.
- Choice of ref 29 duplicate.
- World Health Organization. Ethics and governance of artificial intelligence for health: WHO guidance. Geneva: WHO; 2021.
- U.S. Food & Drug Administration. Artificial Intelligence and Machine Learning–Based Software as a Medical Device. FDA; 2021.
- Grote T, Berens P. On the ethics of algorithmic decision‑making in healthcare. J Med Ethics. 2020;46(3):205–11.
- Parasuraman R, Riley V. Humans and automation: use, misuse, disuse, abuse. Hum Factors. 1997;39(2):230–53.
- Holzinger A, Langs G, Denk H, Zatloukal K, Müller H. Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip Rev Data Min Knowl Discov. 2019;9(4):e1312.
- Amann J, Vetter D, Blomberg SN, Christensen HC, et al. To explain or not to explain? AI explainability in clinical decision support. PLOS Digit Health. 2022;1(1):e0000016.
- Gerke S, Minssen T, Cohen IG. Ethical and legal challenges of AI‑driven healthcare. In: The Oxford Handbook of Health Law. 2021:1–29.
- Meskó B, Győrffy Z, Topol EJ. AI in healthcare: balancing hype with evidence and impact. NPJ Digit Med. 2023;6:155.
Background
Large language models (LLMs) like ChatGPT are rapidly entering clinical contexts. While these models can generate
fluent, guideline-aligned responses and perform well on exams, linguistic fluency does not equal clinical competence. Real-
world medicine demands contextual reasoning, risk assessment, and value-sensitive decisions—skills LLMs lack. The
growing public access to LLMs raises safety concerns, particularly when untrained users interpret AI outputs as medical
advice.
Objective
This study evaluated whether AI’s clinical value depends on the expertise of its user. We compared three groups:
laypersons using ChatGPT, physicians acting independently, and physicians using ChatGPT for decision support.
Methods
In a simulation-based study, 150 participants (50 per group) assessed 15 clinical cases of varying complexity. For each
case, participants provided a diagnosis, a next step, and a brief justification. Responses were scored by blinded physicians
using standardized rubrics. Analyses included ANOVA, effect size estimation, and content review of reasoning quality.
Results
Diagnostic accuracy was highest among physicians using ChatGPT (94.4%), followed by physicians alone (88.0%) and
laypersons with ChatGPT (60.7%). Management quality mirrored this pattern. AI-assisted physicians submitted more
comprehensive plans and took more time, suggesting deeper engagement. Laypersons often reproduced AI outputs
uncritically, lacking contextual understanding and raising safety risks.
Conclusion
AI does not equalize clinical skill—it magnifies it. When used by trained professionals, ChatGPT enhances diagnostic
accuracy and decision quality. In untrained hands, it can lead to error and overconfidence. Integrating LLMs into healthcare
demands thoughtful oversight, clinician training, and safeguards to prevent misuse. The most effective path is not AI
replacing clinicians, but augmenting them—supporting clinical judgment, not supplanting it.
Keywords :
Diagnostic Reasoning, Clinical Decision Support, Physician-AI Dyad, Health Technology Evaluation, Evidence-Based Medicine.