Researchers use NLP techniques to extract data from free-text radiology reports
Natural language processing (NLP) techniques can help extract relevant data from free-text radiology reports, according to a study published in Journal of Digital Imaging.
The authors noted how much time it often took for radiologists to review radiology reports from prior studies. Could NLP save radiologists valuable time?
“It is incumbent on the radiologist to review the full text report and/or images from those prior studies, a process that is time-consuming and confers substantial risk of overlooking a relevant prior study or finding,” wrote authors Daniel J. Goff, MD, and Thomas W. Loehfelm, MD, PhD, from the department of radiology at the University of California Davis Health System in Sacramento. “This risk is compounded when patients have dozens or even hundreds of prior imaging studies. Our goal is to assess the feasibility of natural language processing techniques to automatically extract asserted and negated disease entities from free-text radiology reports as a step towards automated report summarization.”
For the study, two radiologists reviewed the reports from 50 contrast-enhanced CT abdomen/pelvis examinations performed at a single facility from July to December 2016. Using an open-source text annotation solution, the radiologists manually annotated the reports’ findings. Those 50 reports were then processed with an advanced NLP pipeline.
Overall, Goff and Loehfelm reported, one of the radiologists annotated 164 concepts, and the other radiologist annotated 149 concepts. They had perfect agreement on 104 of the 186 total annotations and partial agreement on another 24 annotations, producing an overall agreement rate of 69 percent. Twenty-six annotations were then discarded for being irrelevant during the consensus building phase, leaving 160 total annotations as the “manual gold-standard set.”
The NLP pipeline, meanwhile, made 231 automated annotations. It had a true-positive rate of 86 percent. There were 23 false negatives and 71 false positives. In addition, the F1 score was 0.74, which “compares favorably with other NLP techniques.”
The authors concluded that the performance of the pipeline was “good,” but there is still more work to be done. “We plan to further refine these methods and expand them to cover all diagnostic imaging studies to allow accurate and automated report summaries to be made available at the point-of-care diagnostic workstation,” they wrote.