Current deep learning models trained to generate radiology reports from chest radiographs are capable of producing clinically accurate, clear, and actionable text that can advance patient care. However, such systems all succumb to the same problem: making hallucinated references to non-existent prior reports. Such hallucinations occur because these models are trained on datasets of real-world patient reports that inherently refer to priors. To this end, we propose two methods to directly remove references to priors in radiology reports: (1) a GPT-3-based few-shot approach to rewrite medical reports without references to priors; and (2) a BioBERT-based token classification approach to directly remove tokens referring to priors. We use the aforementioned approaches to modify MIMIC-CXR, a publicly available dataset of chest X-rays and their associated free-text radiology reports; we then retrain CXR-RePaiR, a radiology report generation system, on the adapted MIMIC-CXR dataset. We find that our re-trained model--which we call CXR-ReDonE--outperforms previous report generation methods on clinical metrics, and expect it to be broadly valuable in enabling current radiology report generation systems to be more directly integrated into clinical pipelines.
CXR-ReDonE makes use of multiple GitHub repos. To set up the complete CXR-ReDonE directory, clone the below repos, then perform the following commands inside CXR-ReDonE/
:
cd ifcc
sh resources/download.sh
cd ..
mv ifcc_CXR_ReDonE_code/* ifcc/
rm -rf ifcc_CXR_ReDonE_code
cd CheXbert
mkdir models
cd models
Once inside the
models
directory, download the pretrained weights for CheXbert.
mv CXR-Report-Metric_CXR_ReDonE_code/* CXR-Report-Metric/
rm -rf CXR-Report-Metric_CXR_ReDonE_code
As we made multiple edits to the ALBEF directory, please refer to the ALBEF directory uploaded here instead of cloning a new one. Make sure to download ALBEF_4M.pth (
wget https://storage.googleapis.com/sfr-pcl-data-research/ALBEF/ALBEF_4M.pth
) and place it inALBEF/
.
Here, we make use of CXR-RePaiR's data preprocessing steps:
cd CXR-RePaiR
conda env create -f cxr-repair-env.yml
conda activate cxr-repair-env
First, you must get approval for the use of MIMIC-CXR and MIMIC-CXR-JPG. With approval, you will have access to the train/test reports and the JPG images.
python data_preprocessing/split_mimic.py \
--report_files_dir=<directory containing all reports> \
--split_path=<path to split file in mimic-cxr-jpg> \
--out_dir=mimic_data
python data_preprocessing/extract_impressions.py \
--dir=mimic_data
python data_preprocessing/create_bootstrapped_testset_copy.py \
--dir=mimic_data \
--bootstrap_dir=bootstrap_test \
--cxr_files_dir=<mimic-cxr-jpg directory containing chest X-rays>
The above commands produce the following files: cxr.h5
, mimic_train_impressions.csv
, and mimic_test_impressions.csv
. The expert-annotated test set must still be download from here.
To remove references to priors from the above data files, move the above three files to a new folder CXR-ReDonE/data
, then run:
python remove_prior_refs.py
To skip these steps and instead obtain the final data, navigate to the CXR-PRO
directory and follow the steps there.
To pretrain ALBEF, run:
cd ALBEF
sh pretrain_script.sh
Linked here are the ALBEF model checkpoints with and without removing references to priors from the MIMIC-CXR reports corpus.
To run inference, run the following:
cd ALBEF
python CXR_ReDonE_pipeline.py --albef_retrieval_ckpt <path-to-checkpoint>
For sentence-level report retrieval (where k = # of sentences in outputted report), run:
cd ALBEF
python CXR_ReDonE_pipeline.py --impressions_path ../data/mimic_train_sentence_impressions.csv --albef_retrieval_ckpt <path-to-checkpoint> --albef_retrieval_top_k <k value>
For evaluating the generated reports, we make use of CXR-Report-Metric:
cd ../CXR-Report-Metric
conda create -n "cxr-report-metric" python=3.7.0 ipython
conda activate cxr-report-metric
pip install -r requirements.txt
Next, download the RadGraph model checkpoint from PhysioNet here. The checkpoint file can be found under the "Files" section at path models/model_checkpoint/
. Set RADGRAPH_PATH
in config.py
to the path to the downloaded checkpoint.
- Use
prepare_df.py
to select the inferences for the corresponding 2,192 samples from our generation.
python prepare_df.py --fpath <input path> --opath <output path>
-
In
config.py
, setGT_REPORTS
to../data/cxr2_generated_appa.csv
andPREDICTED_REPORTS
to<output path>
. SetOUT_FILE
to the desired path for the output metric scores. SetCHEXBERT_PATH
to the path to the downloaded checkpoint (CheXbert/models/chexbert.pth
). -
Use
test_metric.py
to generate the scores.
python test_metric.py
- Finally, use
compute_avg_score.py
to output the average scores.
python3 compute_avg_score.py --fpath <input path>
The above steps to run CXR-ReDonE rely on two pretrained models:
The model can be called programmatically as follows:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
def load_model(model_name="rajpurkarlab/filbert"):
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
return tokenizer, model
def run_bert_classifier(sentence, tokenizer, model):
pipe = pipeline("sentiment-analysis", model=model.to("cpu"), tokenizer=tokenizer)
return int(pipe(sentence)[0]['label'][-1])
tokenizer, model = load_model()
run_bert_classifier("SINGLE SENTENCE FROM A REPORT", tokenizer, model)
The model can be called programmatically as follows:
from transformers import AutoTokenizer, AutoModelForTokenClassification
def get_pipe():
model_name = "rajpurkarlab/gilbert"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
pipe = pipeline(task="token-classification", model=model.to("cpu"), tokenizer=tokenizer, aggregation_strategy="simple")
return pipe
def remove_priors(pipe, report):
ret = ""
for sentence in report.split("."):
if sentence and not sentence.isspace():
p = pipe(sentence)
string = ""
for item in p:
if item['entity_group'] == 'KEEP':
string += item['word'] + " "
ret += string.strip().replace("redemonstrate", "demonstrate").capitalize() + ". "
return ret.strip()
modified_report = remove_priors(get_pipe(), "YOUR REPORT")