JACR study shows moderate agreement on PI-RADS classification, highlights pitfalls
A diverse mix of trainees and physicians demonstrated moderate interobserver agreement when sorting prostate cancer cases into categories laid out by the Prostate Imaging Reporting and Data System (PI-RADS), demonstrating resiliency to varying levels of education attained by users. However, a persistently "OK" agreement rate may indicate systemic problems with PI-RADS, according to an article published in the Journal of the American College of Radiology.
Prostate cancer is among the most common cancer in men, meaning a considerable imaging burden is increasingly shouldered by multiparametic MRI (mp-MRI), useful for both prostate cancer detection and management.
The European Society of Urogenital Radiology created the PI-RADS classification system in 2012, aiming to improve the standardization and consistency of mp-MRI reporting. PI-RADS uses a five-point scale calculated using T2-weighted (T2W) and diffusion-weighted imaging (DWI). The categories were updated in 2015 and it’s this version that was used in the study.
Researchers from the University of Colorado-Denver Department of Radiology used a panel of three expert radiologists with almost 20 combined years of subspecialty experience to provide consensus reads before disseminating a survey to attending radiologists, fellows, and residents. After tallying the results, they found categorization accuracy between 54 and 55 percent for individual modalities and 57 percent for all categories. Most notably, they found “no significant difference” in categorization between the “actionable” categories, defined in the paper as categories three through five.
However, recording merely moderate interobserver agreement calls into question the efficacy of PI-RADS guidelines, according to the authors.
“It is curious that our results, which included attending physicians and trainees, are akin to those of studies whose data arose exclusively from small (n = 5 or 6) groups of abdominal imaging attending physicians,” wrote lead author and third-year resident Thomas Flood, MD, et al. “One explanation for this phenomenon is that a substantial portion of the observed interobserver variability may arise from limitations or ambiguity in the very definition of the PI-RADS version 2 schema itself.”
Other sources of error could include unfamiliarity with PI-RADS, outlier imaging exams, or image interpretation errors.
“Further larger scale multisite investigation may be useful to determine the relative contribution of each of these factors and to better understand clinical relevance of our findings,” wrote Flood et al.