Chatbots offer patients more appropriate MRI explanations than experts, study shows

Hannah Murphy | August 27, 2025 | Radiology Business | Artificial Intelligence

Chatbots may be able to accurately explain MRI results to patients in ways that radiologists are not routinely available to do.

New research in Scientific Reports details the performances of two chatbots—GPT o1-preview and Deepseek-R1—in interpreting patients’ MRI reports. The study’s results suggest that such technology could play a valuable role in radiology workflows in the future.

“Despite electronic imaging reports being accessible, their complex content remains a significant barrier for patients trying to understand their own health status. As a result, most patients require additional consultations with physicians to interpret their MRI reports, which can lead to prolonged waiting times for consultations and potentially delay diagnosis and treatment,” Ming Feng, from the department of neurosurgery at Peking Union Medical College Hospital in China, and colleagues explained. “This situation not only increases the workload for outpatient physicians but also adds to the overall healthcare burden.”

To assess the utility of the chatbots, researchers tasked them with providing explanations of findings from the exams of more than 6,000 patients with known tumors. The reports, which varied in their format and complexity, were used by the chatbots to complete four tasks:

Interpret the reports in a way individuals who do not have a medical background can understand.
Classify the lesions as benign, atypical or malignant.
Determine whether surgical intervention is required.
Recommend a treatment plan based on the report’s contents.

Medical reviewers assessed the chatbots’ responses for both readability and accuracy. They determined that both tools improved report readability, though the DeepSeek model reduced text complexity more effectively. Both models also achieved high marks for accurately classifying lesions and determining surgical necessity, though the DeepSeek model again outperformed the GPT tool. DeepSeek significantly outperformed GPT in providing accurate clinical recommendations as well.

It should be noted that both models committed errors related to misinterpretations of medical terminology and AI hallucinations. This, the authors suggested, highlights the importance of AI oversight post-deployment.

“In clinical settings, inaccuracies in generative chatbots’ responses can have severe consequences for patients’ diagnosis and treatment,” the group cautioned. “The primary challenges in deploying these chatbots in medicine relate to accuracy, reliability, and the occurrence of hallucinations.”

To that end, the team presented data on the effectiveness of fine-tuning methods known to improve the performance of LLMs. They suggested that future research on the topic should assess the impact of targeted fine-tuning.

Learn more about the findings here.

AI-enabled devices lacking validation data prior to FDA clearance more likely to be recalled

AI as a second reader outperforms radiologists

Could ChatGPT be up to the task of monitoring AI drift?

Hannah Murphy, Editor

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.