AI fails to beat radiologists in large study, but pairing the two proves prolific
In one of the larger machine learning studies to date, artificial intelligence failed to beat out radiologists in pinpointing breast cancer. However, combining the two could prove to be a game-changer for the specialty.
That’s according to the results of a high-profile new investigation, unveiled Monday in JAMA Network Open. In a study that altogether included more than 300,000 screening mammograms from 150,000-plus women, researchers found that no single AI algorithm beat out community radiologists’ benchmarks.
Such algorithms were developed as part of an international competition aimed at using AI to improve cancer detection—involving 126 teams from 44 different countries. And while AI could not best radiologists on its own, combining the best algorithms from the competition with single-radiologist assessment improved specificity measurably.
In a corresponding JAMA editorial, radiology expert Claudia Mello-Thoms, PhD, admitted “there is a lot to unpack from these results.” However, one key takeaway for the specialty is that, still, “AI is not there yet.”
“With many calling for a total overhaul of radiology owing to AI, studies like this strongly suggest that radiologists will be masters of their domain for quite some time, as the task of image interpretation is significantly more complex than radiologists get credit for,” wrote Mello-Thoms, a professor of radiology at the University of Iowa Carver College of Medicine. “Even with an incredible amount of resources, arguably the best AI teams in the world could not meet or beat the radiologists,” she added later.
The original study was conducted as part of the digital mammography DREAM (Dialogue on Reverse Engineering Assessment and Methods) competition that utilized crowdsourced data and invited participation from across the globe. Organizers asked the top eight teams to create an “ensemble model” that combined their work, which still could not beat rads. This AI model logged a specificity rate of just 76.1%, well short of radiologists' 90.5%. However, combining those physicians’ assessments with the ensemble model produced a 92% rate, “resulting in AI finally beating radiologists,” Mello-Thoms noted.
If put into widespread use, these results could have powerful implications. In the U.S. where a single radiologist reads mammograms, 9%-10% of women who undergo breast screenings are recalled for additional imaging. And only about 4%-5% of those individuals ultimately end up having cancer. Such false positives lead to preventable harm, patient anxiety, benign biopsies and other unnecessary interventions. Plus, they contribute to the roughly $10 billion in annual mammography screening costs in the country, wrote lead author Thomas Schaffter, a data engineer and research scientist at Sage Bionetworks, in Seattle.
Deploying radiologists in tandem with the ensemble model reduced the recall rate by about 1.5 percentage points, which may seem like a small number. But with about 40 million women receiving these tests in the U.S. each year, it could result in more than 500,000 individuals avoiding “unnecessary diagnostic workup,” the research team concluded.
“Taken together, our results suggest that adding AI to mammography interpretation in single-radiologist settings could yield significant performance improvements, with the potential to reduce healthcare system expenditures and address the recurring radiologist person-power issues experienced in population-based screening programs,” wrote Schaffter and dozens of other co-authors from across the globe.