Hands-free, generative AI system reduces radiologists' time spent creating reports

An experiment using a hands-free, generative AI system is helping dramatically reduce the time radiologists spend creating reports, according to new research.

Generating such documents is essential for patient management but is often time-consuming and labor intensive, experts wrote Tuesday in the Journal of the American College of Radiology [1].  Stony Brook University radiologists experimented with using large language model GPT-4 to aid physicians in this process. 

For the study, radiologists used voice-to-text software to dictate their findings and create a raw text transcript. Researchers then used GPT-4 to develop a radiology report based on Stony Brook University Hospital’s institutional imaging template. 

“The AI-driven system demonstrated considerable proficiency in transcribing and generating radiology reports, albeit with occasional inaccuracies,” Austin Young, with the Renaissance School of Medicine at the Stony Brook, New York, institution, and co-authors wrote Oct. 15. “While it achieved a high level of precision in standard scenarios, its performance varied in more complex cases.”

The prospective study involved nearly 100 imaging exams including 53 chest X-rays and 46 more of musculoskeletal concerns such as the shoulder, elbow and wrist. Four body radiologists dictated their findings into voice-to-text software from Microsoft’s Nuance. Researchers then de-identified the reports and ran them through ChatGPT-4 to convert them into the corresponding template. Meanwhile, four more radiologists generated reports via conventional methods for the same set of medical images—manually typing or dictating findings onto the computer-based system via the appropriate template. 

Two independent radiologists, blinded to how the reports were created, then evaluated the AI-generated content using the traditional reports as the “gold-standard comparison.” They rated AI reports on a five-point scale, with 1 indicating poor quality and 5 representing the opposite. Average scores among the two radiologists were 4.35 for comprehensiveness, 4.64 for clarity, 4.59 for factual accuracy and 4.26 for conciseness. Agreement between the two reviewers was “decent” for comprehensiveness and factual accuracy but poor for clarity (4.61 vs. 4.66) and conciseness (4.26 vs. 3.63). The traditional approach took a median time of 32 seconds (IQR: 12.5-51.5) to produce a report versus 25 seconds (IQR: 11-39) for the AI-based method, representing a 22% reduction. 

However, the system struggled to handle more complex cases, the authors cautioned. GPT-4 faced challenges accurately categorizing findings and generating concise summaries, often instead producing “lengthy and overly detailed” reports.

“Additionally, radiologists noted a frequent overuse of the phrase ‘correlate clinically,’ particularly in complicated cases, which added little value to the diagnostic process,” the authors noted. “Despite these challenges, the system's efficiency in managing straightforward cases presents a clear advantage. By automating the reporting process for simple cases, radiologists can allocate more time and attention to complex cases, potentially improving overall diagnostic accuracy and workflow efficiency. As burnout within the field of radiology continues to grow, application of these technologies represents potential areas to combat burnout.” 

Marty Stempniak

Marty Stempniak has covered healthcare since 2012, with his byline appearing in the American Hospital Association's member magazine, Modern Healthcare and McKnight's. Prior to that, he wrote about village government and local business for his hometown newspaper in Oak Park, Illinois. He won a Peter Lisagor and Gold EXCEL awards in 2017 for his coverage of the opioid epidemic. 

Around the web

After reviewing years of data from its clinic, one institution discovered that issues with implant data integrity frequently put patients at risk. 

Prior to the final proposal’s release, the American College of Radiology reached out to CMS to offer its recommendations on payment rates for five out of the six the new codes.

“Before these CPT codes there was no real acknowledgment of the additional burden borne by the providers who accepted these patients."

Trimed Popup
Trimed Popup