Google's Gemini answers patients' IR questions with greater accuracy, more empathy than providers

Some of today’s large language models (LLMs) may be more effective communicators than providers when it comes to explaining interventional radiology procedures. 

A new analysis in Frontiers in Radiology details the potential for LLMs as an assistive tool for patient communications. Experts tasked several well-known LLMs and one IR provider with answering patient queries related to varicocele embolization—a minimally invasive procedure used to treat enlarged veins in the scrotum—to determine how the models’ answers compared with the expert’s. Not only were the LLMs’ responses accurate, but they also showed signs of empathy. 

The authors of the paper suggested their findings are especially relevant due to the head-to-head comparison of LLMs alongside a human expert. 

“Large language models appear to be capable of performing a variety of tasks, including answering questions, but there are few studies evaluating them in direct comparison with clinicians,” Ozgur Genc, with the Department of Radiology at Istanbul Aydin University VM Medical Park in Turkey, and colleagues noted. 

For the study, the team prompted three LLMs—ChatGPT-4o, Gemini Pro, and Microsoft Copilot—to answer 25 questions patients frequently search online. One IR provided answers to the same questions, and two others assessed the answers using a 5-point Likert scale for academic accuracy and empathy. 

Out of all four respondents, Gemini achieved the highest scores for both accuracy and empathy. Copilot and ChatGPT followed close behind, while the IR’s responses earned the lowest scores (though they still received high marks). Effect sizes were medium for academic accuracy and large for empathy, the authors noted.  

“These preliminary findings suggest that AI models hold significant potential to complement patient education systems in interventional radiology practice and provide compelling evidence for the development of hybrid patient education models,” the group concluded. 

These findings are in line with similar recent studies that tasked chatbots with providing patient-friendly summaries of their radiology reports. However, despite their growing skill set and potential, LLMs will require near-constant monitoring to ensure they provide accurate information to patients, experts warn. 

Read more about the most recent findings here. 

Hannah Murphy
Hannah Murphy, Editor

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.

Around the web

Nee Khoo, MBChB, director of the echocardiography lab at Canada's Stollery Children’s Hospital, explains the rapid adoption of pediatric 3D transesophageal echo.

 

Research that followed patients for more than 13 years suggests increases in ECG PR interval, P‐wave duration and PTFV1 are associated with increased AFib risk in this cohort.

Just because it’s efficient doesn’t mean it’s effective. Nowhere is this thought more deserving of reflection than in healthcare AI—especially when it’s applied to clinical decision support.