We have attempted to replicate some experiments in Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs that was published in JAMA 2016; 316(22) [1]. In February 2018 the paper had 236 citations in Google Scholar. To our knowledge this presented work is the first attempt to reproduce their results. We had to replicate the method since the source code is not available. Since replication studies are uncommon in the field of deep learning, we believe our results give a general insight into the reproducibility of published deep learning methods. This repository presents the source code for this replication study, and this README file gives instructions to run the replication on your own machine.
Python requirements:
- Python 3
- Tensorflow >= 1.4
- OpenCV >= 1.3
- Pillow
- h5py
- xlrd
Other requirements:
- p7zip-full
-
Run
$ git submodule update --init
to load the create_tfrecords repository. This tool will convert the data sets into TFRecord files. -
Download the Kaggle EyePACS data set and place all files in the
data/eyepacs
folder. -
Run
$ ./eyepacs.sh
to preprocess the Kaggle EyePACS data set, and redistribute this set into a training and test set. Run with the--only_gradable
flag if you want to train and evaluate with gradable images only. NB: This is a large data set, so this may take hours to finish. -
Download the Messidor-Original data set and place all files in the
data/messidor
folder. -
Run
$ ./messidor.sh
to preprocess the Messidor-Original data set. Run with the--only_gradable
flag if you want to evaluate with gradable images only.
To start training with default settings, run $ python train.py
. To train with stochastic gradient descent, specify the -sgd
flag. Optionally specify the path to where models checkpoints should be saved to with the -sm
parameter.
Run $ python train.py -h
to see additional optional parameters for training with your own data set, or where to save summaries or operating threshold metrics.
To evaluate or test the trained neural network on the Kaggle EyePACS test set, run $ python evaluate.py -e
. To evaluate on Messidor-Original, run it with the -m
flag instead.
To create an ensemble of networks and evaluate the linear average of predictions, use the -lm
parameter. To specify multiple models to evaluate as an ensemble, the model paths should be comma-separated or satisfy a regular expression. For example: -lm=./tmp/model-1,./tmp/model-2,./tmp/model-3
or -lm=./tmp/model-?
.
The evaluation script outputs a confusion matrix, and specificity and sensitivity by using an operating threshold. The default operating threshold is 0.5, and can be changed with the -op
parameter.
Run $ python evaluate.py -h
for additional parameter options.
[1] Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Nelson PC, Mega JL, Webster DR. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA. 2016;316(22):2402–2410. doi:10.1001/jama.2016.17216