These deep learning algorithms outperformed a panel of 11 pathologists
During a 2016 simulation exercise, researchers evaluated the ability of 32 different deep learning algorithms to detect lymph node metastases in patients with breast cancer. Each algorithm’s performance was then compared to that of a panel of 11 pathologists with time constraint (WTC). Overall, the team found that seven of the algorithms outperformed the panel of pathologists, publishing an in-depth analysis in JAMA.
“To our knowledge, this is the first study that shows that interpretation of pathology images can be performed by deep learning algorithms at an accuracy level that rivals human performance,” wrote lead author Babak Ehteshami Bejnordi, MS, Radboud University Medical Center in Nijmegen, the Netherlands, and colleagues.
The simulation took place during the Cancer Metastases in Lymph Nodes Challenge 2016 (CAMELYON16) in the Netherlands. Twenty-three teams submitted the 32 algorithms.
The challenge focused on the analysis of sentinel axillary lymph nodes (SLNs), which Bejnordi et al. noted was critical to patient care. “Accurate breast cancer staging is an essential task performed by pathologists worldwide to inform clinical management,” the authors wrote. “Assessing the extent of cancer spread by histopathological analysis of SLNs is an important part of breast cancer staging. The sensitivity of SLN assessment by pathologists, however, is not optimal.”
The area under the receiver operating characteristic curve (AUC) for all of the deep learning algorithms fell in a range of 0.556 to 0.994. The mean AUC for the pathologists WTC was 0.810.
The authors also studied the performance of a single pathologist without time constraints. That pathologist’s AUC was 0.966, “comparable” to the top five algorithms, which had a mean AUC of 0.960.
Bejnordi et al. wrote that their study did have limitations. For example, they said, the exercise was quite different than “routine pathology workflow.”
“The test data set on which algorithms and pathologists were evaluated was enriched with cases containing metastases and, specifically, micrometastases and, thus, is not directly comparable with the mix of cases pathologists encounter in clinical practice,” the authors wrote. “Given the reality that most SLNs do not contain metastases, the data set curation was needed to achieve a well-rounded representation of what is encountered in clinical practice without including an exorbitant number of slides. To validate the performance of machine learning algorithms, such as those developed in the CAMELYON16 competition, a prospective study is required.”