AI studies can be confusing, even for imaging professionals—but they don't have to be
Machine learning (ML) has become one of the hottest topics in radiology and all of healthcare, but reading the latest and greatest ML research can be difficult, even for experienced medical professionals. A new analysis published in the American Journal of Roentgenology was written with that very problem in mind.
“For those who are unfamiliar with the field of ML, the emerging research can be daunting, with a wide variation in the terms used and the metrics presented,” wrote Guy S. Handelman, from the department of radiology at Northern Ireland’s Belfast City Hospital, and colleagues. “How can we, as readers, tell if the predictive model being presented is good or is even better than another model presented elsewhere?”
Handelman et al. put together a helpful guide for imaging professionals who want to get more out of reading ML research. These are some of the general ML terms the authors described in detail:
1. Cross Validation: Cross validation is how a lot of ML algorithms generate various performance measures, the authors explained. When researchers begin work on their algorithm, they separate their subjects into two groups: a training dataset and a testing dataset. The training dataset is used to create the algorithm, training it so that it can make predictions. The testing dataset, meanwhile, is used as an initial test of the algorithm’s accuracy.
“This can be taken one step further and can avoid the cost of limiting the size of the dataset by repeating this process many times, each time assigning a different group of study patients to the training set and to the testing set,” the authors wrote. “Each iteration will not only improve the performance of the model, because the program can compare between each training set's results to see what performs best and can alter its overall predictive capability, but also improve the generalizability of the results.”
2. ROC Curve: By “plotting the effect of different levels of sensitivity on specificity,” researchers can help readers understand the performance of their algorithm. This is how the ROC curve was born. The authors explained that the levels used for an algorithm’s ROC curve are task-and system-specific.
“Algorithms that perform better will have a higher sensitivity and specificity and thus the area under the plotted line will be greater than those that perform worse,” the authors wrote. “The metric termed the ‘area under the ROC curve’ or ‘AUROC’ is commonly quoted and offers a quick way to compare algorithms.”
3. Confusion Matrix: A confusion matrix helps readers of a study locate information about a specific term or metric they are especially interested in. A study may mention an algorithm’s accuracy, but what if there are more important metrics a specific instance than accuracy? The confusion matrix helps the reader locate those other metrics. The reader may be more interested in the algorithm’s negative likelihood value, for instance—they will know to turn to the confusion matrix.
4. Image Segmentation Evaluation: Again, a study may focus on accuracy—but what if the reader doesn’t necessarily want to focus on accuracy? When the algorithm is designed to detect the presence of something, for instance, it’s not just about detecting the finding; it’s about looking at where it was detected and the size of the finding. Image segmentation evaluation takes such things into account.
“In this evaluation method, the predicted area of interest generated by the algorithm is compared against an ideal or completely accurate evaluation image,” the authors wrote.