Don’t depend on machine learning for diagnostic imaging support just yet, experts say
Radiologists and other clinicians should not yet depend on machine learning for diagnostic support because robust evidence is still lacking, experts wrote Thursday in JAMA Network Open.
Advances in mathematical modeling and computing power have led to an explosion in published artificial intelligence algorithms, with some claiming they can outshine human radiologists. But a group of U.K. researchers urges caution, with few randomized trials or prospective studies to support these assertions.
“This systematic review found no robust evidence that the use of [machine learning]-based algorithms was associated with better clinician diagnostic performance,” study co-author Stephan Ursprung, with the Department of Radiology at the University of Cambridge, and colleagues wrote March 11.
To reach their conclusions, Ursprung et al. queried medical literature databases for research logged between 2010 and 2019. They sought peer-reviewed studies, comparing clinician performance with and without the use of machine learning-based diagnostic clinical decision support systems. Out of 8,112 studies initially retrieved, researchers screened 5,154 abstracts before landing on 37 that met the inclusion criteria, with most pertaining to lung pathology or the diagnosis of cancer.
Published machine learning trials included a median number of only four clinicians. Out of 107 results reporting statistical significance, 50% saw an increase in metrics stemming from the use of clinical decision support. Another 4% decreased while the other 46% showed either unclear or zero change.
In the select few studies conducted in clinical settings, there was no association between improved clinician performance and the use of machine learning diagnostic support. About 76% of studies were rated as having a high risk of bias based on the Quality Assessment of Diagnostic Accuracy Studies tool. Another 16% were found to be at “serious or critical risk of bias” when graded using the Risk of Bias in Non-Randomized Studies–Intervention scale.
Ursprung and colleagues additionally observed that human operators “almost always” decided to override at least some of the clinical decision support system recommendations. They are urging for more thorough evaluation of such systems and that more consideration be given to the “human component of assisted diagnosis.”
“Increased regulatory scrutiny also has an important role in ensuring a safe and efficient translation to the patient bedside,” the authors concluded. “The results of this review should not be interpreted as tarnishing the prospects of ML-based diagnostic CDSSs,” they added. “Rather, we encourage qualitative improvements in future research.”
You can read the full analysis in JAMA Network Open here, and a corresponding editorial here.