Deep Active Learning for Diabetic Retinopathy

This code repository contains the code to reproduce the experiments in our paper, Efficient Labeling of Retinal Fundus Photographs Using Deep Active Learning.

Active learning strategies were obtained from: https://github.com/decile-team/distil

Setup

Hardware

Models were trained on a machine with 8 CPU cores, 64 GB RAM, and a 24 GB NVIDIA RTX 3090 GPU. Without modifications, our code requires a GPU with at least 16 GB VRAM.

Environment

We use Anaconda to create a virtual environment. An identical environment can be setup with src/environment.sh.

Data

Data for this paper are publicly available. However, a Kaggle account (https://kaggle.com) is required to download the data. After an account has been created, the data can be downloaded with the following command line commands:

cd data
kaggle competitions download -c diabetic-retinopathy-detection
wget https://storage.googleapis.com/kaggle-forum-message-attachments/90528/2877/retinopathy_solution.csv
unzip diabetic-retinopathy-detection.zip 
# Requires 7-Zip
7z x train.zip.001
7z x test.zip.001 
mkdir aptos 
cd aptos
kaggle competitions download -c aptos2019-blindness-detection
unzip aptos2019-blindness-detection.zip

Running Experiments

Please note, we use Weights and Biases (https://wandb.ai) to log experiments. When running experiments, you may be prompted to create an account or run the experiments without an account, in which case, Weights and Biases will log everything offline. We leave this decision to the user's discretion.

To reproduce the experiments in our paper, run the following commands from the src/ directory:

python main.py active_learning configs/mks/mk000.yaml \
    --active-learning-strategy RandomSampling \
    --num-workers 8 --gpus 1 --device-num 0 \
    --precision 16 --benchmark

python main.py active_learning configs/mks/mk000.yaml \
    --active-learning-strategy EntropySampling \
    --num-workers 8 --gpus 1 --device-num 0 \
    --precision 16 --benchmark

python main.py active_learning configs/mks/mk000.yaml \
    --active-learning-strategy BALDDropout \
    --num-workers 8 --gpus 1 --device-num 0 \
    --precision 16 --benchmark

python main.py active_learning configs/mks/mk000.yaml \
    --active-learning-strategy CoreSet \
    --num-workers 8 --gpus 1 --device-num 0 \
    --precision 16 --benchmark

python main.py active_learning configs/mks/mk000.yaml \
    --active-learning-strategy AdversarialBIM \
    --num-workers 8 --gpus 1 --device-num 0 \
    --precision 16 --benchmark

Results are saved in src/../experiments/active_learning/mk000/{STRATEGY}/.

A summary of results in the above directory is available in results.csv. After each iteration, labeled and unlabeled indices are saved in round{iteration}/labeled_and_unlabeled_at_end_of_round.pkl. Checkpoints for each iteration are available in round{iteration}/checkpoints/.

Experiment configurations can be changed by editing YAML configuration files in src/configs, for example, src/configs/mk000.yaml. These can also serve as the basis for new configuration files generated by the user.

Please note, despite setting random seeds, re-running the code with the same configurations may not result in identical results due to non-deterministic execution order of CUDA operations on GPU.

Contact

Questions can be addressed to:

Samantha K. Paul, MD (Samantha.Paul2 AT UHHospitals . org)

Ian Pan, MD (IPan AT bwh . harvard . edu )