AI improves radiologists’ skeletal-age assessment accuracy while reducing interpretation times
Artificial intelligence can help improve radiologists’ accuracy at estimating skeletal age while also reducing interpretation times, according to a new analysis published Monday.
Such assessments are crucial for determining children’s developmental status to treat scoliosis and several other growth disorders. However, this can be a time consuming and tedious task for physicians, involving hand radiographs and identifying the reference-standard image to compare against in an atlas.
AI has previously shown promise helping radiologists estimate skeletal age. But experts from several institutions sought to test its effectiveness in a real-world setting, sharing their results Sept. 28 in Radiology.
“Overall diagnostic error was significantly decreased when the AI algorithm was used compared with when it was not,” David Eng, with Stanford University’s Department of Computer Science, and co-authors concluded in the Radiological Society of North American’s flagship journal. “Taken together, these findings support careful consideration of AI for use as a diagnostic aid for radiologists and reinforce the importance of interactive effects between human radiologists and AI algorithms in determining potential benefits and harms of assistive technologies in clinical medicine,” they added later.
Eng et al. conducted their prospective study across six U.S. radiology departments between 2018 and 2019. Prior to that, the skeletal-age algorithm was trained using nearly 13,000 hand X-rays gathered at the participating institutions and tested on an open-source dataset from the RSNA AI challenge. For the investigation, scientists randomly assigned skeletal age assessments to be performed with the algorithm as a diagnostic aid (792) and without (739), with 93 radiologists interpreting the images.
Using artificial intelligence produced a smaller difference in skeletal age when compared to exams that did not use AI (5.4 months vs. 6 months), which occurred at 5 of the 6 study sites. Meanwhile, median radiologist interpretation time fell from 142 seconds to 102 seconds when using AI, Eng and colleagues noted.
In a corresponding editorial, radiologist David Rubin, MD, labeled this clinical scenario an “ideal” application for AI, noting there is only one diagnosis—an estimated bone age. Plus, the task is typically considered “tedious and time consuming by many radiologists” while requiring experience as an expert reader. That’s why several products have already been developed to address this concern, while the RSNA challenge received 105 entries.
Rubin commended the study for simulating a real-world scenario using multiple centers and giving radiologists the ability to accept or override AI’s decisions.
“I believe it is time to think about eliminating the human-based ground truth for future applications,” he wrote. “While expert consensus was a necessary initial step in evaluating new algorithms, it is possible that some AI already outperforms radiologists, but current study design (using a human-based reference standard) makes that impossible to show. In essence, we are not training algorithms to find the most correct answer but rather to best predict what the radiologist-based diagnosis would be.”