ChatGPT shows ‘tremendous’ potential in simplifying readability of radiology reports
ChatGPT and other such large language models could prove useful in rewriting radiology reports to easier reading levels, according to a new analysis published Oct. 5 in European Radiology [1].
Researchers with LMU University Hospital in Munich, Germany, recently tested out the technology’s performance at executing this task. For the investigation, a veteran radiologist created three fictitious reports, each containing multiple findings, which were associated with one another.
The study team then prompted ChatGPT to “explain this medical report to a child using simple language,” inputting the findings in plain text. Fifteen physicians rated the results based on accuracy and thoroughness.
“Most participating radiologists agreed that the simplified reports are overall factually correct, complete, and not potentially harmful to patients,” Katharina Jeblick, PhD, a clinical data scientist at LMU, and co-authors concluded. “At the same time, the radiologists also identified factually incorrect statements, missing relevant medical information, and text passages in a considerable number of simplified reports, which might lead patients to draw potentially harmful conclusions. This demonstrates the need for further model adaption to the medical field and for professional medical oversight.”
Jeblick et al. conducted their study in December 2022. Fabricated reports used for the investigation covered three clinical scenarios: magnetic resonance imaging of the knee, another MRI of the knee, and a follow-up/whole-body CT for a “fictious oncological imaging event.”
Radiologists were informed that the simplified reports were generated via ChatGPT and asked to rate the result on a five-point scale. Participants “generally agreed” (at a median score of 2) that the simplified reports were factually correct and complete. About 75% said they “agreed” or “strongly agreed” on both quality criteria. Meanwhile, radiologists disagreed (median of 4) on the potential that patients could draw incorrect conclusions from the simplified reports, resulting in physical or psychological harm. In the free-text analysis, radiologists highlighted incorrect information found in 10 simplified reports (22%). Another 16 (or 36%) had potentially harmful conclusions.
Despite these issues, Jeblick and co-authors see promise when deploying large language models in this fashion.
“While further quantitative studies are needed, the initial insights of this study unveil a tremendous potential in using LLMs like ChatGPT to improve patient-centered care in radiology and other medical domains,” the authors advised.
Read more about the results at the link below.