GPT-4o translates radiology reports written in different languages in less than 30 seconds

Hannah Murphy | August 05, 2025 | Radiology Business | Enterprise Imaging

Large language models can play a vital role in translating radiology reports written in different languages.

Specifically, GPT-4o, OpenAI’s updated version of ChatGPT, recently proved itself to be more than capable of translating reports between multiple languages. What’s more, the LLM did so in a matter of seconds, significantly reducing the amount of time it takes for human translators to interpret reports in foreign languages.

Researchers involved in the analysis shared their findings in the European Journal of Radiology.

“Understanding radiology findings is essential for accurate diagnosis and effective treatment decisions,” Alexander Isaak, DO, with the department of diagnostic and interventional radiology at University Hospital Bonn, in Germany, and colleagues noted. “However, in today’s globalized world, patients increasingly present medical reports, including radiology findings, written in languages unfamiliar to their healthcare providers. These language barriers can hinder the integration of vital medical data into the clinical workup, leading to delays in diagnosis and suboptimal patient care.”

Using zero-shot prompting, the group tasked GPT-4o with interpreting 100 radiology reports from German into English, French, Spanish and Russian. The resulting translations, which included reports from X-ray, ultrasound, CT and MRI, were reviewed by eight bilingual radiologists for general readability, overall quality, and utility for translators using 5-point Likert scales.

GPT-4o's translation processing times ranged between 9 and 24 seconds. Translations in English, French and Spanish garnered quality scores of 4.5 out of 5, while the scores for Russian interpretations ranged from 3.5 to 4. Usefulness of the translations was highest for English, yielding a score of 5 out of 5, and readability and completeness of reports in English, French and Spanish were also highly rated by the radiologists.

Factual correctness of the translations, however, wavered a bit. For English and French, factual correctness averaged 84% and 83%, but for Russian, this figure was significantly lower at 69%. Errors considered potentially harmful were present in around 4% of the reports overall, though 9% of the Russian interpretations were higher at 9%.

“With considerable overall factual correctness and an average processing time of under 25 seconds, these findings underscore the potential of large language models to provide rapid and reliable translation of radiology reports in clinical practice while also highlighting the need for language-specific quality monitoring,” the group noted, adding that these findings have positive implications for regions with diverse patient populations.

Read more about the LLM’s performance here.

Large language models outperform physicians at imaging modality selection, study shows

GPT-4o's 'all or nothing' accuracy continues to hinder its radiologic capabilities

ChatGPT's medical advice may be deterring women from necessary imaging

Hannah Murphy, Editor

In addition to her background in journalism, Hannah also has patient-facing experience in clinical settings, having spent more than 12 years working as a registered rad tech. She began covering the medical imaging industry for Innovate Healthcare in 2021.