Which LLM is best for explaining radiology results?

Hannah Murphy | October 30, 2025 | Radiology Business | Artificial Intelligence

There are numerous commercially available large language models capable of assisting with radiology report interpretation, but which one is best suited for patient use?

A new paper published in Insights Into Imaging recently sought to answer this question. Researchers compared the utility of popular LLMs—ChatGPT, Gemini and Copilot—to determine if one had an edge over its competitors in terms of readability and accuracy. With patients gaining access to imaging reports earlier than ever, it's critical they have access to large language models that can accurately summarize the results.

“Misinterpretation of radiology reports may potentially contribute to patient anxiety, confusion or misinformed decisions; however, such implications remain hypothetical given the limited patient-based evidence,” co-authors Ahmet Bozer, MD, and Yeliz Pekçevik, MD, both from the Department of Radiology at Ministry of Health Izmir City Hospital, Turkey, cautioned. “LLMs, capable of generating natural language explanations, offer a potential bridge between radiologists and patients by translating technical findings into patient-friendly language.”

For the study, researchers input 100 anonymized radiology reports into each of the three LLMs using the prompt, “Can you explain my radiology report?” Responses were assessed using a 3-point scale (0–2) based on their understandability, readability and medical accuracy. The group also evaluated the LLMs’ uncertainty language, patient guidance and clinical suggestions.

Each of the models produced medically accurate responses, yielding an overall score of 1.97 out of 2. The individual models each excelled in different ways; for example, ChatGPT provided the most readable and understandable responses, while Gemini offered the strongest patient guidance. Copilot was most uncertain in its responses and provided the most variations in the readability of its responses.

“While all models demonstrated consistent outputs in terms of diagnostic content, their communication characteristics varied,” the authors noted. “These findings underscore the potential of LLMs to support patient-facing radiological communication; however, their application should be guided by specific communication goals and supplemented by expert oversight.”

The team added that future work would benefit from including patients’ perspective on LLMs’ responses.