This code repository contains the code to reproduce the experiments in our paper, Efficient Labeling of Retinal Fundus Photographs Using Deep Active Learning.
Active learning strategies were obtained from: https://github.com/decile-team/distil
Models were trained on a machine with 8 CPU cores, 64 GB RAM, and a 24 GB NVIDIA RTX 3090 GPU. Without modifications, our code requires a GPU with at least 16 GB VRAM.
We use Anaconda to create a virtual environment. An identical environment can be setup with
src/environment.sh
.
Data for this paper are publicly available. However, a Kaggle account (https://kaggle.com) is required to download the data. After an account has been created, the data can be downloaded with the following command line commands:
cd data
kaggle competitions download -c diabetic-retinopathy-detection
wget https://storage.googleapis.com/kaggle-forum-message-attachments/90528/2877/retinopathy_solution.csv
unzip diabetic-retinopathy-detection.zip
# Requires 7-Zip
7z x train.zip.001
7z x test.zip.001
mkdir aptos
cd aptos
kaggle competitions download -c aptos2019-blindness-detection
unzip aptos2019-blindness-detection.zip
Please note, we use Weights and Biases (https://wandb.ai) to log experiments. When running experiments, you may be prompted to create an account or run the experiments without an account, in which case, Weights and Biases will log everything offline. We leave this decision to the user's discretion.
To reproduce the experiments in our paper, run the following commands from the src/
directory:
python main.py active_learning configs/mks/mk000.yaml \
--active-learning-strategy RandomSampling \
--num-workers 8 --gpus 1 --device-num 0 \
--precision 16 --benchmark
python main.py active_learning configs/mks/mk000.yaml \
--active-learning-strategy EntropySampling \
--num-workers 8 --gpus 1 --device-num 0 \
--precision 16 --benchmark
python main.py active_learning configs/mks/mk000.yaml \
--active-learning-strategy BALDDropout \
--num-workers 8 --gpus 1 --device-num 0 \
--precision 16 --benchmark
python main.py active_learning configs/mks/mk000.yaml \
--active-learning-strategy CoreSet \
--num-workers 8 --gpus 1 --device-num 0 \
--precision 16 --benchmark
python main.py active_learning configs/mks/mk000.yaml \
--active-learning-strategy AdversarialBIM \
--num-workers 8 --gpus 1 --device-num 0 \
--precision 16 --benchmark
Results are saved in src/../experiments/active_learning/mk000/{STRATEGY}/
.
A summary of results in the above directory is available in results.csv
. After each iteration, labeled and unlabeled
indices are saved in round{iteration}/labeled_and_unlabeled_at_end_of_round.pkl
. Checkpoints for each iteration
are available in round{iteration}/checkpoints/
.
Experiment configurations can be changed by editing YAML configuration files in
src/configs
, for example, src/configs/mk000.yaml
. These can also serve as the basis for new
configuration files generated by the user.
Please note, despite setting random seeds, re-running the code with the same configurations may not result in identical results due to non-deterministic execution order of CUDA operations on GPU.
Questions can be addressed to:
Samantha K. Paul, MD (Samantha.Paul2 AT UHHospitals . org)
Ian Pan, MD (IPan AT bwh . harvard . edu )