FineRadScore

This repository contains code to evaluate a pair of reports (candidate and ground truth) using the FineRadScore evaluation framework.

datasets/: contains csv headers for the ReFiSco-v0, ReFiSco-v1, and ReXVal datasets. You will need to download these datasets after signing a PhysioNet agreement and/or replace these files with the pairs of candidate/ground truth reports you want FineRadScore to evaluate on. Descriptions for each dataset can be found below:
- refisco-v0.csv: ReFiSco-v0 dataset
- refisco-v1.csv: ReFiSco-v1 dataset
- refisco-v1-paraphrased.csv: contains paraphrased versions in the column corrected_paraphrase of the generated reports of a subset of the ReFiSco-v1 dataset
- rexval_full.csv: full ReXVal dataset
- ReXVal_test_40.csv: test split of the ReXVal dataset used to evaluate RadCliQ

Creating an Environment

This repository was setup using conda. To create an environment, run conda create -n testenv python=3.9. Then, run conda activate testenv and pip install -r requirements.txt to install required packages.

Preprocess Datasets

Run python preprocess_datasets.py to preprocess datasets. You should see new files appear in the datasets/ folder.

Adding API Keys

Run export OPENAI_API_KEY=<api key> to add your OpenAI API key. Also modify lines 8-10 in gpt4_generations.py accordingly to match your api type, version, and base information.

Run export ANTHROPIC_API_KEY=<api key> to add your Anthropic API key.

Run Experiments on ReFiSco datasets

Run python run_refisco_experiments.py <version> <setting> <model>

version: v0, v1
setting: zeroshot, original, perturbed, paraphrased
model: gpt4, claude3

Original, perturbed, and paraphrased settings are all using the few-shot prompt. For example, python run_refisco_experiments.py v1 original gpt4.

Run Experiments on ReXVal datasets

Run python run_rexval_experiments.py <version> <setting> <model>

version: test, full
setting: zeroshot, fewshot
model: gpt4, claude3