ChatGPT effective at simplifying breast imaging recall letters

Large language model ChatGPT is effective at helping to simplify breast imaging recall letters, according to new research published Thursday. 

The Mammography Quality Standards Act requires radiologists to inform patients about their exam results using simple-to-understand language. This can include instances where a patient is “recalled,” needing to return for another exam due to dense breast tissue obscuring results, suboptimal imaging technique or a suspicious finding. However, almost half of patients with concern for cancer were unaware that their mammogram results were abnormal, one previous study found. 

Radiologists with the Emory University School of Medicine in Atlanta recently aimed to improve the readability of their own letters, deploying ChatGPT to help in the process. They’ve achieved early success, publishing their findings in the journal Current Problems in Diagnostic Radiology

“We successfully created new patient-centric breast imaging recall letters that met best practice criteria for readability, understandability/comprehensibility, actionability, utility, and design,” Jada Hislop, MD, and co-authors wrote April 17. “We found the use of ChatGPT and human edits helpful to efficiently simplify language, although the perceived efficiency was outweighed by the time it took to identify the best prompt.”

Researchers conducted the investigation at an academic center that performs screening mammography at seven sites handling an annual volume of 102,000 exams. As of 2023, they recorded a recall rate of roughly 13%, detecting cancer in about 8 of every 1,000 individuals imaged. 

Their goal was to develop “patient-centered” screening recall letters, focused on a few categories in the Breast Imaging-Reporting and Data System (BI-RADS). Researchers first assessed the institution’s old letters, determining they were written at a 10th grade or higher reading level (when they should ideally be 4th to 6th grade). Hislop and colleagues then used ChatGPT to simplify the language. This was followed by editorial modifications to shorten the documents to one page, restore any meaning lost in LLM translation, and remove any inaccuracies or redundancies. 

Afterward, reading grade levels dropped to from 12th to 7th for the BI-RADS 0 (incomplete and requires further evaluation) letter and from 11th to 6th for the BI-RADS 0-DB (dense breast tissue) letter. Patient Education Materials Assessment Tool ratings for understandability in the BI-RADS 0 recall message improved from 41% to 90% following the revisions while actionability leapt from 50% to 88%. Same for the dense breast tissue note, with understandability (46% to 85%) and actionability (44% to 73%) also improving considerably. 

The authors touted other strategies that helped improve the institution’s readability scores. These included gathering feedback from patients and family members and soliciting input from other healthcare providers. Emory experts encouraged others to pay special attention when crafting large language model prompts to help produce the desired results. 

“It can be challenging to find a prompt that yields the expected output,” they wrote. “We tried several prompts before we found one that yielded simplified language that was not overly simple and that was not altering the meaning of the source text too much. The importance of prompt constructions is well-recognized and subject to further prompt design development to help users in gaining contextually accurate responses from ChatGPT in the future.”

Marty Stempniak

Marty Stempniak has covered healthcare since 2012, with his byline appearing in the American Hospital Association's member magazine, Modern Healthcare and McKnight's. Prior to that, he wrote about village government and local business for his hometown newspaper in Oak Park, Illinois. He won a Peter Lisagor and Gold EXCEL awards in 2017 for his coverage of the opioid epidemic. 

Around the web

News of an incident is a stark reminder that healthcare workers and patients aren’t the only ones who need to be aware around MRI suites.

The ACR hopes these changes, including the addition of diagnostic performance feedback, will help reduce the number of patients with incidental nodules lost to follow-up each year.

And it can do so with almost 100% accuracy as a first reader, according to a new large-scale analysis.