AI draws conclusions from interventional radiology adverse events data, helping docs design interventions
A large language model can analyze massive amounts of interventional radiology safety data and draw conclusions, helping IR specialists to design interventions.
Large volumes of medical device adverse event data accumulate daily, University of Toronto experts detailed in the Canadian Association of Radiologists Journal [1]. In 2022 alone, U.S. Food and Drug Administration databases recorded nearly 3 million incidents, with most reports collected in a free-text field.
“When attempting to generate meaningful insights from such databases, human analysis is limited by multiple factors like expertise, time required, lack of uniformity, and fatigue,” Blair E. Warren, a radiology resident in the university’s Department of Medical Imaging at the time of the study, and colleagues wrote Aug. 21. “Consequently, databases that house critical safety information may be underutilized.”
For the study, Warren and co-authors analyzed FDA adverse event data related to thermal ablation, an IR procedure that uses microwave energy to destroy cancer and tumor cells. The sample included 1,189 incidents tallied between 2011 and 2021 in the U.S., with three residents cleaning the information, and an IR fellow analyzing the final tally.
GPT-4 was trained on a set of 25 IR adverse events, validated on 639 more and tested on 79. The large language model from OpenAI demonstrated high precision in classifying interventional radiology cases, with an accuracy of 86.4% on the larger validation set.
“The [large language model] emulated the human analysis, suggesting feasibility of using LLMs to process large volumes of IR safety data as a tool for clinicians,” the authors advised.
Mechanical failures were among the most common malfunctions, particularly fracturing of the probe tip, a common occurrence in microwave ablation medical literature. These adverse events often stem from applying improper or excessive force through the probe. In almost half of fractures (42.7%), providers left the probe tip in its original place. However, because of the “limited nature” of U.S. FDA data, Warren and co-authors could not discern the long-term outcomes.
“This study demonstrates the feasibility of using AI to generate reports on data that might otherwise be under evaluated,” the authors concluded. “Importantly, automated analysis of these data could be created by non-AI experts and the resulting LLMs could act as early detectors to identify important insights that may otherwise have not been explored. This proposed AI-human collaborative implementation would be low-risk, given the data most often already exist, and the AI would be used to augment human analysis, with continued expert human oversight as the final safeguard.”
Read more, including potential study limitations, at the link below.