Radiology reports are critical for clinical decision-making but often lack a standardized format, limiting both human interpretability and machine learning (ML) applications. While large language models (LLMs) have shown strong capabilities in reformatting clinical text, their high computational requirements, lack of transparency, and data privacy concerns hinder practical deployment. To address these challenges, we explore lightweight encoder-decoder models (<300M parameters)—specifically T5 and BERT2BERT—for structuring radiology reports from the MIMIC-CXR and CheXpert Plus datasets. We benchmark these models against eight open-source LLMs (1B–70B parameters), adapted using prefix prompting, in-context learning (ICL), and low-rank adaptation (LoRA) finetuning. Our best-performing lightweight model outperforms all LLMs adapted using prompt-based techniques on a human-annotated test set. While some LoRA-finetuned LLMs achieve modest gains over the lightweight model on the Findings section (BLEU 6.4%, ROUGE-L 4.8%, BERTScore 3.6%, F1-RadGraph 1.1%, GREEN 3.6%, and F1-SRR-BERT 4.3%), these improvements come at the cost of substantially greater computational resources. For example, LLaMA-3-70B incurred more than 400 times the inference time, cost, and carbon emissions compared to the lightweight model. These results underscore the potential of lightweight, task-specific models as sustainable and privacy-preserving solutions for structuring clinical text in resource-constrained healthcare settings.
Model | Variant | HuggingFace Link |
---|---|---|
BERT2BERT | RoBERTa-base | 🤗 StanfordAIMI/SRR-BERT2BERT-RoBERTa-base |
RoBERTa-biomed | 🤗 StanfordAIMI/SRR-BERT2BERT-RoBERTa-biomed | |
RoBERTa-PM-M3 | 🤗 StanfordAIMI/SRR-BERT2BERT-RoBERTa-PM-M3 | |
RadBERT | 🤗 StanfordAIMI/SRR-BERT2BERT-RadBERT | |
T5 | T5-Base | 🤗 StanfordAIMI/SRR-T5-Base |
Flan-T5 | 🤗 StanfordAIMI/SRR-StanfordAIMI/SRR-T5-Flan | |
SciFive | 🤗 StanfordAIMI/SRR-T5-SciFive | |
Dataset | HuggingFace Link |
---|---|
SRRG-Findings | 🤗 StanfordAIMI/srrg_findings |
Requirements:
pip install transformers==4.44.0 torch==2.3
import io import torch from transformers import EncoderDecoderModel, AutoTokenizer # step 1: Setup model_name = "StanfordAIMI/SRR-BERT2BERT-RoBERTa-base" device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # step 2: Load Processor and Model model = EncoderDecoderModel.from_pretrained(model_name).to(device) tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, padding_side="right", use_fast=False) model.config.decoder_start_token_id = tokenizer.cls_token_id model.config.bos_token_id = tokenizer.cls_token_id model.eval() # step 3: Inference (example from MIMIC-CXR dataset) input_text = "CHEST RADIOGRAPH PERFORMED ON ___ COMPARISON: Prior exam from ___. CLINICAL HISTORY: Weakness, assess pneumonia. FINDINGS: Frontal and lateral views of the chest were provided. Midline sternotomy wires are again noted. The heart is poorly assessed, though remains enlarged. There are at least small bilateral pleural effusions. There may be mild interstitial edema. No pneumothorax. Bony structures are demineralized with kyphotic angulation in the lower T-spine again noted. IMPRESSION: Limited exam with small bilateral effusions, cardiomegaly, and possible mild interstitial edema." inputs = tokenizer(input_text, padding="max_length", truncation=True, max_length=512, return_tensors="pt") inputs["attention_mask"] = inputs["input_ids"].ne(tokenizer.pad_token_id) # Add attention mask input_ids = inputs['input_ids'].to(device) attention_mask=inputs["attention_mask"].to(device) generated_ids = model.generate( input_ids, attention_mask=attention_mask, max_new_tokens=286, min_new_tokens= 120,decoder_start_token_id=model.config.decoder_start_token_id, num_beams=5, early_stopping=True, max_length=None )[0] decoded = tokenizer.decode(generated_ids, skip_special_tokens=True) print(decoded)
Output:
Exam Type: Chest Radiograph History: Clinical history includes weakness with a need to assess for pneumonia. Technique: Frontal and lateral views of the chest were obtained. Findings: Pleura: - Small bilateral pleural effusions. Cardiovascular: - Enlarged cardiac silhouette. Lungs and Airways: - Possible mild interstitial edema. - No evidence of pneumothorax. Musculoskeletal and Chest Wall: - Midline sternotomy wires present. - Bony structures show demineralization. - Kyphotic angulation in the lower thoracic spine. Impression: 1. Small bilateral pleural effusions. 2. Cardiomegaly. 3. Possible mild interstitial edema.
@article{structuring-2025,
title={Structuring Radiology Reports: Challenging LLMs with Lightweight Models},
author={Moll, Johannes and Fay, Louisa and Azhar, Asfandyar and Ostmeier, Sophie and Lueth, Tim and Gatidis, Sergios and Langlotz, Curtis and Delbrouck, Jean-Benoit},
journal={arXiv preprint arXiv:2506.00200},
url={https://arxiv.org/abs/2506.00200},
year={2025}
}