Most AI research focused on radiology lacks external validation
Most researchers exploring the performance of artificial intelligence (AI) in radiology aren’t validating their results, according to a new study published by the Korean Journal of Radiology.
“As with any other medical devices or technologies, the importance of thorough clinical validation of AI algorithms before their adoption in clinical practice through adequately designed studies to ensure patient benefit and safety while avoiding any inadvertent harms cannot be overstated,” wrote Dong Wook Kim, MD, department of radiology at Taean-gun Health Center and County Hospital in Korea, and colleagues.
The authors noted that assessing the performance of AI algorithms designed to analyze medical images “requires appropriately designed external validation.” This validation, they explained, should include “adequately sized datasets” made up of data from new patients or patients not used to train the algorithms.
The researchers examined more than 500 original research articles published from Jan. 1, 2018, to Aug. 17, 2018. Overall, just 6 percent of the studies included external validation. There was not a significant difference when comparing research published in medical journals with research from non-medical journals.
One distinction the researchers made was that this does not necessarily mean every study they observed “was inadequately designed.”
“The four criteria used in this study—external validation and data for external validation being obtained using a diagnostic cohort study, from multiple institutions, and in a prospective manner—are fundamental requirements for studies that intend to evaluate the clinical performance of AI algorithms in real-world practice,” the authors wrote. “These would be excessive for studies that merely investigate technical feasibility.”
The authors also noted that additional guidelines on properly researching and validating AI have been published in recent months.
“We suspect that most studies that we analyzed in this study may have been conceived or executed before these methodologic guides were made available,” they wrote. “Therefore, the design features of studies that intend to evaluate the clinical performance of AI algorithms for medicine may improve in the future.”