Skip to the content.

Shared task on Large-Scale Radiology Report Generation

0. Paper submission

In response to multiple requests, the leaderboard will remain open until the very end. Please ensure that the results in your overview paper match your submission on the online leaderboard.

System papers due date: May 17th (Friday), 2024

All papers will be submitted at:

In the submission form, choose:

Submission Categories: Please enter the categories under which the submission should be reviewed. Task (*):

Your paper should respect the format:

TeamA at RRG24: Title

Please read these instructions:

Submission are now open at Submission are now open at Submission are now open at

An important medical application of natural language generation (NLG) is to build assistive systems that take X-ray images of a patient and generate a textual report describing clinical observations in the images. This is a clinically important task, offering the potential to reduce radiologists’ repetitive work and generally improve clinical communication.

1. Task Overview

Given one or multiple chest X-rays from one study, the participants must generate the corresponding radiology report. In the scope of this task, two sections are considered: findings and impressions. Each section will have its own separate evaluation and leaderboard. These sections may be produced using either a single system or two distinct systems.

1.1 Rules

All participants will be invited to submit a paper describing their solution to be included in

the Proceedings of the 23rd Workshop on Biomedical Natural Language Processing (BioNLP) at ACL 2024. If you do not wish to write a paper, you must at least provide a thorough description of your system which will be included in the overview paper for this task. Otherwise, your submission (and reported scores) will not be taken into account.

1.2 Timeline

All deadlines are 11:59 PM (“Anywhere on Earth”).

2. Data

Below are the data used for the challenge. Please note:

2.1 Training

Dataset Findings Count Impressions Count
PadChest 101,752 -
BIMCV-COVID19 45,525 -
CheXpert 45,491 181,619
OpenI 3,252 3,628
MIMIC-CXR 148,374 181,166
Total 344,394 366,413

2.2 Validation

Dataset Findings Count Impressions Count
CheXpert 1,112 4,589
BIMCV-COVID19 1,202 -
PadChest 2,641 -
OpenI 85 92
MIMIC-CXR 3,799 4,650
Total 8,839 9,331

2.3 Test

Please see

2.4 Access

Here are the steps to access the dataset of this challenge with the correct splits:

1) The datasets (image and findings/impression pairs) of CheXpert, BIMCV-COVID19 (en), PadChest (en) and OpenI can be access through the huggingface dataset at the following url: Once you have been granted access, you can invoke the dataset using the following:

from datasets import load_dataset

dataset = load_dataset("StanfordAIMI/interpret-cxr-public")

2) You’ll have to handle the MIMIC-CXR processing on your own by utilizing the script. It’s crucial to use this script as it ensures the proper splits are defined. Please have the following structure ready (files folder is from mimic-cxr-jpg):

├── files
│   ├── p10
│   ├── p11
│   ├── ...
│   └── p19
├── mimic-cxr-2.0.0-metadata.csv
├── mimic-cxr-2.0.0-split.csv
└── mimic_cxr_sectioned.csv

And run python If you have a hash error (i.e. the created files arent what was expected), please email me at jbdel at stanford dot edu.

Then, you can then collate both datasets as such:

from datasets import load_dataset, Sequence, Image, DatasetDict, concatenate_datasets

dataset = load_dataset("StanfordAIMI/interpret-cxr-public")
dataset_mimic = load_dataset(
    data_files={"train": "train_mimic.json", "validation": "val_mimic.json"},
).cast_column("images", Sequence(Image()))
dataset_final = DatasetDict({"train": concatenate_datasets([dataset["train"], dataset_mimic["train"]]),
                             "validation": concatenate_datasets([dataset["validation"], dataset_mimic["validation"]])})

Please note that the save_to_dict operation can take time:
` Saving the dataset (147/147 shards): 100%|██████████| 550395/550395 [7:28:42<00:00, 20.44 examples/s] `

The final dataset should be as detailed below:

    train: Dataset({
        features: ['source', 'findings', 'images', 'impression', 'images_path'],
        num_rows: 550395
    validation: Dataset({
        features: ['source', 'findings', 'images', 'impression', 'images_path'],
        num_rows: 14111

3) One more dataset, VinDr-CXR, can be used as training data if you wish, but we do not process on our end. You will need to do it yourself at

3. Metrics

The submissions will be automatically evaluated with the following metrics:

Also, the top participants will be evaluated against CheXagent [5].


JB Delbrouck
Jean-Benoit Delbrouck
Zhihong Chen
Zhihong Chen
Maya Varma
Maya Varma
Curtis Langlotz
Curtis Langlotz



  title={BERTScore: Evaluating Text Generation with BERT},
  author={Zhang, Tianyi and Kishore, Varsha and Wu, Felix and Weinberger, Kilian Q and Artzi, Yoav},
  booktitle={International Conference on Learning Representations},


  title={Bleu: a method for automatic evaluation of machine translation},
  author={Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing},
  booktitle={Proceedings of the 40th annual meeting of the Association for Computational Linguistics},


  title={Rouge: A package for automatic evaluation of summaries},
  author={Lin, Chin-Yew},
  booktitle={Text summarization branches out},


  title={Improving the Factual Correctness of Radiology Report Generation with Semantic Rewards},
  author={Delbrouck, Jean-Benoit and Chambon, Pierre and Bluethgen, Christian and Tsai, Emily and Almusa, Omar and Langlotz, Curtis},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2022},


  title={CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation},
  author={Chen, Zhihong and Varma, Maya and Delbrouck, Jean-Benoit and Paschali, Magdalini and Blankemeier, Louis and Van Veen, Dave and Valanarasu, Jeya Maria Jose and Youssef, Alaa and Cohen, Joseph Paul and Reis, Eduardo Pontes and others},
  journal={arXiv preprint arXiv:2401.12208},