Skip to main content
Feature Story

Two Roads Diverging: The Future of Patient Communication in the Era of Artificial Intelligence

September 2025

© 2025 HMP Global. All Rights Reserved.
Any views and opinions expressed are those of the author(s) and/or participants and do not necessarily reflect the views, policy, or position of The Dermatologist or HMP Global, their employees, and affiliates.

The rapid advancement of large language models (LLMs), which power applications like ChatGPT and Gemini, is tackling an area of medical software: artificial intelligence (AI) integration into patient-facing clinical communication. From auto-drafting replies to patient portal messages and scheduling appointments to triaging symptoms, AI-powered patient communication systems are beginning to manage portions of routine clinical dialogue.

With the AI patient engagement market projected to reach $23.1 billion by 2030, the question is no longer if AI will become a third party in this critical interaction, but how its involvement will shape the physician-patient relationship.1 How the health care system, dermatology, and our patients adopt this technology will dictate whether we derive a net benefit or create more problems than we fix. The future will either look more like an AI-powered free-for-all or an orderly, well-curated orchard of interaction options for patients.

Strengths and Weakness

LLMs do not understand or reason in a human sense; they are fundamentally pattern recognition engines that, based on the statistical patterns learned from trillions of examples, calculate and assemble the most probable sequence of words, sub-words, or characters (called tokens) to follow a given prompt. This means that the output is not a product of cognitive thought, but rather a probabilistic completion of a sentence.

At their core, LLMs are sophisticated neural networks that process text by first breaking it down into discrete tokens similar to how we deconstruct a sentence into individual building blocks. The structure of tokens in a particular model is a key underlying differentiator. Input text is converted to tokens, which are then passed to an embedding layer (the “what”) and positionally encoded (the “where”). Modern LLMs have vocabularies of over 100,000 tokens and counting. This process preserves the context of the input and generates 2 vectors that are then added and passed to a neural network called a transformer. Recall from physics that vectors have a magnitude and direction. In the high-dimensional space of an LLM, the relative positions and relationships between vectors are the meaning.

The transformer processes input vectors using its self-attention mechanism, which allows the model to dynamically weigh the significance of every vector in relation to all others, producing a final, context-rich output vector. This single output vector is then passed to a final layer that converts it into a probability distribution over the entire vocabulary. The token with the highest probability is selected, sent back to the tokenizer, and decoded into text. Through a process called autoregression, this new token is added to the input sequence, and the entire cycle repeats until the model generates a special end-of-sequence token analogous to the stop codon in mRNA. A basic familiarity with this process is helpful in understanding the strengths and weaknesses of LLMs in the context of patient communication.2

This technical understanding reveals the LLM’s primary strengths: the ability to generate both detailed prose and concise summaries. However, it also exposes its most critical weakness in a clinical setting: because the output is based on statistical probability rather than true reasoning, the model can “hallucinate” and state dangerous misinformation with the same confident authority as it does factual information. The term hallucination refers to a phenomenon where an LLM confidently generates factually incorrect or nonsensical information. This artifact occurs when the model predicts the next most probable word to form a coherent sentence, but that sentence does not accurately answer the prompt. Remember, there is no internal fact database for ground truth, just the most likely prediction.3

Guardrails Needed

The Wild West version of an AI-powered future echoes clinician concern over the evolution from Dr Google to Dr ChatGPT, with minimal guardrails or regulation. While patients have long used online platforms to understand their health, AI chatbots can generate fluent, human-like responses that obscure significant limitations. A key patient safety concern is the chatbot’s vulnerability to misinformation. A recent study in Communications Medicine found that the latest models of ChatGPT, Gemini, and Claude will not only repeat but confidently elaborate on fabricated medical details embedded in a user’s question, inventing explanations for nonexistent conditions. This study highlights a critical weakness in current frontier LLM models. Hallucination of diagnosis or treatment information is a major risk for patients who may not be equipped to distinguish between plausible-sounding falsehoods and accurate medical advice.4

Because this technology is so new, assessing actual harm and real risk to patients will take time. However, a recent case report highlights the potential for harm and the high stakes of AI errors in medicine. A 63-year-old man was experiencing neurologic symptoms, including multiple episodes of double vision, 4 days after a cardiac procedure. The patient consulted ChatGPT, which suggested that his vision problems were a side effect of his procedure, a less severe explanation than the stroke he feared. He decided to stay home rather than seek immediate medical attention; however, he was ultimately diagnosed with a transient ischemic attack. Reliance on the chatbot’s incomplete and appeasing diagnosis led to a significant delay in treatment for a potentially life-threatening condition.5

In dermatology, we can easily imagine the following scenario: a patient worried about a new, irregularly shaped mole asks an AI chatbot about their concern, and the chatbot reassuringly suggests it has features of a seborrheic keratosis. Relieved by this plausible explanation, the patient decides to forgo making an immediate appointment, thereby delaying the potential diagnosis of an early-stage melanoma when evaluation is most critical.

A concerning trend is emerging in the largely unregulated landscape of AI-powered health advice. According to a recent report in MIT Technology Review, AI companies have almost entirely stopped including disclaimers that their chatbots are not a substitute for professional medical advice. The study found that while over 26% of AI responses to health questions included a warning in 2022, the percentage has plummeted to less than 1% in 2025, even as the models are being used more widely for medical queries. This removal of guardrails stands in stark contrast to the cautious personal warnings from industry leaders like OpenAI CEO Sam Altman, who stated he would not trust his “medical fate to ChatGPT with no human doctor in the loop,” and demonstrates that these sensitive conversations lack the legal privilege of physician-patient confidentiality.6

This creates a striking divergence from other professional fields, such as law, where AI tools consistently include explicit disclaimers when rendering legal advice. A recent Princeton study analyzed user responses to warning messages from chatbot interfaces to “seek professional legal advice.” They found that, of 900 participants, warning messages “did not appear to significantly reduce trust or deter users from relying on chatbots for sensitive legal issues” and stronger warnings did not conclusively increase deterrence.7

Luckily, a more organized approach to AI clinical communication is underway that uses the LLM technology offered by OpenAI, Gemini, Claude, and other platforms to create curated patient interaction experiences. The critical distinction is human-in-the-loop control over patient messages and requests.

In early 2023, the Mayo Clinic became one of the first health care systems to adopt Epic’s augmented response technology (ART), a new tool that uses OpenAI’s GPT LLM to draft initial responses to triaged, non-urgent patient messages. A pilot study found that generative AI-powered drafting saved nurses 30 seconds per message. Mayo Clinic reports that the Epic ART system handled over 4 million messages in a year, saving more than 30,000 hours of staff time.8 Of note, this study was restricted to non-urgent messages, which were defined as those that did not require clinical decision-making.

Using a similar approach, Duke Health is adapting LLM platforms for safe and effective patient communication by establishing rigorous validation frameworks with human-in-the-loop guardrails. In a study published in the Journal of the American Medical Informatics Association, researchers evaluated AI-drafted messages for clarity, completeness, and safety. Their framework, called SCRIBE, uses a combination of clinician feedback and automated metrics to ensure that the AI’s output is empathetic, readable, and accurate. While the study found strong performance in the tone and readability of the AI-generated drafts, it also revealed crucial “gaps in the completeness of responses,” reinforcing the necessity of their validation process and the mandatory oversight by a human physician to ensure patient safety before any communication occurs.9

Readying for the Future

The keys to developing high-quality patient interaction assets powered by AI are validation, controls, and guardrails. AI’s immediate value in this domain lies in expediting routine communication, freeing up clinicians to focus on complex decision-making and building patient relationships. For dermatologists, AI could become a powerful tool for patient education and routine communication; however, this ideal outcome requires widespread adoption of best-use practices for validating generative AI output before it makes it to the patient. Until this happens, understanding the LLM’s faults can better prepare health care providers as they advise a new wave of patients interacting with these AI chatbots.


Dr Pearlman is a cosmetic and Mohs micrographic surgeon at Medical Dermatology Associates of Chicago.

Disclosure: The author is a member of the AAD DataDerm Oversight Committee. He holds stock in and serves as the CEO of Stratum Biosciences, Inc, a biotechnology start-up based out of JLabs@NYC with significant AI/ML assets for developing skin technology. He has served on advisory boards for Castle Biosciences. He has no commercial interest in any product mentioned in the manuscript.

References

  1. AI in patient engagement market size, share & trends analysis report by technology (NLP), by delivery type (cloud-based, on-premise), by functionality (enhanced communication, predictive analytics), by therapeutic area, by end use, by region, and segment forecasts, 2024–2030. Grand View Research. Accessed August 6, 2025. https://www.grandviewresearch.com/industry-analysis/artificial-intelligence-ai-patient-engagement-market-report
  2. Han S, Wang M, Zhang J, Li D, Duan J. A review of large language models: fundamental architectures, key technological evolutions, interdisciplinary technologies integration, optimization and compression techniques, applications, and challenges. Electronics. 2024;13(24):5040. doi:10.3390/electronics13245040
  3. Asgari E, Montaña-Brown N, Dubois M, et al. A framework to assess clinical safety and hallucination rates of LLMs for medical text summarisation. NPJ Digit Med. 2025;8(1):274. doi:10.1038/s41746-025-01670-7
  4. Omar M, Sorin V, Collins JD, et al. Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support. Commun Med (Lond). 2025;5:330. doi:10.1038/s43856025-01021-3
  5. Saenger JA, Hunger J, Boss A, Richter J. Delayed diagnosis of a transient ischemic attack caused by ChatGPT. Wien Klin Wochenschr. 2024;136(7-8):236-238. doi:10.1007/s00508-024-02329-1
  6. O’Donnell J. AI companies have stopped warning you that their chatbots aren’t doctors. MIT Technology Review. July 21, 2025. Accessed August 11, 2025. https://www.technologyreview.com/2025/07/21/1120522/ai-companies-have-stopped-warning-you-that-their-chatbots-arent-doctors
  7. McCarthy BE. Beyond a reasonable doubt? Evaluating user trust and reliance on AI-generated legal advice. Princeton University Library. April 7, 2025. Accessed August 11, 2025. https://theses-dissertations.princeton.edu/entities/ publication/1397f594-7b7a-4be5-9683-e4ad136cec7e/full
  8. Cacciaglia A. Gen AI saves nurses time by drafting responses to patient messages. EpicShare. March 4, 2024. Accessed August 11, 2025. https://www.epicshare. org/share-and-learn/mayo-ai-message-responses
  9. Hong C, Chowdhury A, Sorrentino AD, et al. Application of unified health large language model evaluation framework to in-basket message replies: bridging qualitative and quantitative assessments. J Am Med Inform Assoc. 2025;32(4): 626-637. doi:10.1093/jamia/ocaf023