ChatGPT excels at assessing breast pain symptoms, deciding if patients require imaging
Large language model ChatGPT is effective at assessing breast pain symptoms and deciding if patients need imaging, according to new research published Friday.
Clinically insignificant pain typically does not require diagnostic evaluation because it is not associated with malignancy. These triage decisions have become more critical in recent years, given heavy workload demands and prolonged patient wait times, experts write in Clinical Imaging.
Researchers recently experimented with using GPT-4 to aid in these judgments. They found that the large language model correctly identified the clinical significance for nearly 75% of breast pain symptoms. Among cases deemed by radiologists as warranting further evaluation, the LLM accurately labeled about 89%, enabling breast centers to forgo manual review in most cases.
“Our findings support the potential use of ChatGPT GPT-4 for automated classification of breast pain based on clinical significance, which could potentially be performed by a member of the radiology scheduling team with oversight by a clinical nurse navigator,” Hana Haver, MD, with Massachusetts General Brigham, and co-authors wrote May 30. “In this context, the high sensitivity of ChatGPT GPT-4 for clinically significant pain could help rule out patients likely to have clinically significant pain.”
For the study, Haver and colleagues developed 150 different “patient-centered breast pain clinical vignettes.” These sample scenarios encompassed variants described in the American College of Radiology Appropriateness Criteria for Breast Pain and were frequently encountered based on the authors’ experience. Researchers also incorporated other nonpainful but important symptoms such as a palpable lump or pathologic nipple discharge.
Based on radiologist consensus, about 43% of the clinical vignettes were considered clinically significant and 57% were assessed as insignificant. GPT-4 correctly classified 74.7% of clinical vignettes with breast pain symptoms. Of the 64 cases with either clinically significant pain or an additional concerning symptom, GPT-4 correctly identified 57 cases for a sensitivity of 89.1% and false negative rate of 10.9%. “Notably,” the authors added, the LLM accurately identified all cases of concerning symptoms warranting diagnostic evaluation. When ChatGPT made incorrect assessments, they added, it typically overstated clinical significance.
“Breast imaging practices … have implemented strategies to reduce unnecessary diagnostic imaging for clinically insignificant pain,” the authors noted. “These strategies include nursing staff identifying patients with clinically insignificant pain from imaging orders and/or electronic medical record, contacting them to verify symptoms, and rescheduling their unnecessary diagnostic imaging appointments as age-appropriate screening. However, this strategy requires manually reviewing all diagnostic appointments for evaluation of breast pain symptoms, which is time-consuming and burdensome. This approach would be more efficient if breast imaging centers could minimize the number of patients with clinically significant pain in their manual review, as these patients are already scheduled for appropriate diagnostic imaging.”
Read more about the results, including potentially study limitations, in Clinical Imaging.