How Radiology Partners is using large language models to monitor AI deployment
Large language models are increasingly being used in radiology to extract data from narrative reports to improve workflow, and to help validate, monitor and improve other artificial intelligence tools being used in clinical practice.
Walter Wiggins, MD, PhD, a neuroradiologist and director of clinical AI at Rad Partners, hosted a hands-on session on this topic at the Radiological Society of North America annual meeting and described in an interview with Radiology Business how the technology works.
“We use large language models to extract information about specific findings from radiology reports in order to compare that information to an output from a vision AI tool in order to either conduct pre-deployment monitoring, or pre-deployment validation or post-deployment monitoring, or even to curate data for training and AI model,” Wiggins said.
While the work is technically sophisticated, he emphasized it has clear real-world value for clinical practices deploying AI.
“We feel strongly at [Rad Partners technology division] Mosaic that if you are going to deploy an AI tool in your practice, that you gather some data and test it on the model ahead of time,” Wiggins explained.
Using LLMs to extract findings from reports allows practices to better understand baseline AI performance, including metrics radiologists care about most. He said the things rads pay attention to are positive predictive value and the sensitivity of an AI model. It also is important to know how often the AI is going to pick up on pathology when it is in an image, and that should be reflected in the radiology report. Wiggins said this process also helps identify where models underperform.
At Rad Partners, these methods are already embedded into daily operations.
“We have a number of tools that we've deployed across the practice and we do integrate this into our clinical training for radiologists, but we also have monitoring running in the background on all the AI tools we have deployed,” Wiggins said.
That monitoring helps the group understand how radiologists interact with AI, whether they are appropriately rejecting incorrect results, and whether the tools are actually improving detection of important findings. Without this data, it is difficult to understand how helpful the AI is.
“If the model's not helping you improve in your detection of important clinical findings, then perhaps it's not something that's providing the value you're expecting in your practice,” he said.
Wiggins also noted that AI can help boost radiologists' confidence and efficiency. This help often comes as a second set of eyes to help detect or confirm subtle findings when radiologists are uncertain or deciding if something is an image artifact.
For practices considering AI adoption, Wiggins offered clear guidance. “The strong recommendation is that you do pre-deployment validation,” he said.
This includes outlining a process for sharing representative data with vendors under proper HIPAA protections and performing human-in-the-loop reviews. That approach helps distinguish true AI errors from human reporting errors, identify impressive cases, and to help uncover instances where radiologists should be more skeptical of AI outputs.