Automated DR detection system which will be provided as a service to the doctors to use it for the betterment of humanity.
Diabetic retinopathy is a leading problem throughout the world and many people are losing their vision because of this disease. The disease can get severe if it is not treated properly at its early stages. The damage in the retinal blood vessel eventually blocks the light that passes through the optical nerves which makes the patient with Diabetic Retinopathy blind. Therefore, in our research we wanted to find out a way to overcome this problem and thus using the help of Convolutional Neural Network (ConvNet), we wereable to detect multiple stages of severity for Diabetic Retinopathy. There are other processes present to detect Diabetic Retinopathy and one such process is manual screening, but this requires a skilled ophthalmologist and takes up a huge amount of time. Thus our automatic diabetic retinopathy detection technique can be used to replace such manual processes and theophthalmologist can spend more time taking proper care of the patient or at least decrease the severity of this disease.
Currently, In India diabetes is a disease that affects over 65 million persons in India.
Diabetes-related eye disease, of which retinopathy is the most important, affects nearly one out of every ten persons with diabetes, according to point prevalence estimates. Many few of them are aware of that if they have diabetes for over several years they may come across the diabetic complication.
To spread awareness among people major hospitals in India organizes the free eye checkup camps in villages where people can get their eye checkup for free.
Those retinal images of people were collected and sent to an expert Ophthalmologist. After that, the Ophthalmologist examines those images and the summons those patients who were likely to suffer from Diabetic Retinopathy.
This summoned patient than were informed that they are likely to suffer from Diabetic Retinopathy and should consult the expert Ophthalmologist for a proper checkup.
This whole process takes almost half a month or more and to shorten this gap we had come up with the idea which almost cut down these process into one or two days, which help the Ophthalmologist to focus more on the treatment and avoid the hectic work of identifying which patient has Diabetic Retinopathy and which doesn't.
Diabetic retinopathy is the leading cause of blindness in the working-age population of the developed world. The condition is estimated to affect over 93 million people.
The need for a comprehensive and automated method of diabetic retinopathy screening has long been recognized, and previous efforts have made good progress using image classification, pattern recognition, and machine learning. With photos of eyes as input, the goal of this project is to create a new model, ideally resulting in realistic clinical potential.
The motivations for this project are twofold:
-
Image classification has been a personal interest for years, in addition to classification on a large scale data set.
-
Time is lost between patients getting their eyes scanned (shown below), having their images analyzed by doctors, and scheduling a follow-up appointment. By processing images in real-time, EyeNet would allow people to seek & schedule treatment the same day
The data originates from a 2015 Kaggle competition. However, is an atypical Kaggle dataset. In most Kaggle competitions, the data has already been cleaned, giving the data scientist very little to preprocess. With this dataset, this isn't the case.
All images are taken of different people, using different cameras, and of different sizes. Pertaining to the preprocessing section, this data is extremely noisy, and requires multiple preprocessing steps to get all images to a useable format for training a model.
The training data is comprised of 35,126 images, which are augmented during preprocessing.
The very first item analyzed was the training labels. While there are five categories to predict against, the plot below shows the severe class imbalance in the original dataset.
Confusion matrix of original train CSV.
Of the original training data, 25,810 images are classified as not having retinopathy, while 9,316 are classified as having retinopathy.
Due to the class imbalance, steps taken during preprocessing in order to rectify the imbalance, and when training the model.
Furthermore, the variance between images of the eyes is extremely high. The first two rows of images show class 0 (no retinopathy); the second two rows show class 4 (proliferative retinopathy).
Different types of data preprocessing and data augmentation techniques we use to deal with major class imbalance
The preprocessing pipeline is the following:
-
Gregwchase approach
- Crop images into 1800x1800 resolution
- Resize images to 512x512/256x256 resolution
- Remove totally black images form dataset
- Rotate and mirror(Rotate DR images to 90°,120°,180°,270° + mirror, and only mirror non-DR images)
- Update CSV so it should contain all the augmented images and there respective labels
- Convert images to numpy array
-
Ben Graham approach(Only Works in python2.7)
- Rescale the images to have the same radius (300 pixels or 500 pixels)
- Subtracted the local average color; the local average gets mapped to 50% gray
- Clipped the images to 90% size to remove the boundary effects
In total, the original dataset totals 35 gigabytes. All images were croped down to 1800 by 1800.
All images were scaled down to 512 by 512 and 256 by 256. Despite taking longer to train, the detail present in photos of this size is much greater then at 128 by 128.
Additionally, 403 images were dropped from the training set. Scikit-Image raised multiple warnings during resizing, due to these images having no color space. Because of this, any images that were completely black were removed from the training data.
All images were rotated and mirrored.Images without retinopathy were mirrored; images that had retinopathy were mirrored, and rotated 90, 120, 180, and 270 degrees.
The first images show two pairs of eyes, along with the black borders. Notice in the cropping and rotations how the majority of noise is removed.
After rotations and mirroring, the class imbalance is rectified, with a few thousand more images having retinopathy. In total, there are 106,386 images being processed by the neural network.
Confusion matrix of new CSV after image augmentation.
Our first models used 120x120 rescaled input and I stayed with that for a decent amount of time in the beginning (first 3-4 weeks). A week or so later our first real model had an architecture that looked like this (listing the output size of each layer).
Nr | Name | batch | channels | width | height | filter/pool |
---|---|---|---|---|---|---|
0 | Input | 32 | 3 | 120 | 120 | |
1 | Cyclic slice | 128 | 3 | 120 | 120 | |
2 | Conv | 128 | 32 | 120 | 120 | 3//1 |
3 | Conv | 128 | 16 | 120 | 120 | 3//1 |
4 | Max pool | 128 | 16 | 59 | 59 | 3//2 |
5 | Conv roll | 128 | 64 | 59 | 59 | |
6 | Conv | 128 | 64 | 59 | 59 | 3//1 |
7 | Conv | 128 | 32 | 59 | 59 | 3//1 |
8 | Max pool | 128 | 32 | 29 | 29 | 3//2 |
9 | Conv roll | 128 | 128 | 29 | 29 | |
10 | Conv | 128 | 128 | 29 | 29 | 3//1 |
11 | Conv | 128 | 128 | 29 | 29 | 3//1 |
12 | Conv | 128 | 128 | 29 | 29 | 3//1 |
13 | Conv | 128 | 64 | 29 | 29 | 3//1 |
14 | Max pool | 128 | 64 | 14 | 14 | 3//2 |
15 | Conv roll | 128 | 256 | 14 | 14 | |
16 | Conv | 128 | 256 | 14 | 14 | 3//1 |
17 | Conv | 128 | 256 | 14 | 14 | 3//1 |
18 | Conv | 128 | 256 | 14 | 14 | 3//1 |
19 | Conv | 128 | 128 | 14 | 14 | 3//1 |
20 | Max pool | 128 | 128 | 6 | 6 | 3//2 |
21 | Dropout | 128 | 128 | 6 | 6 | |
22 | Maxout (2-pool) | 128 | 512 | |||
23 | Cyclic pool | 32 | 512 | |||
24 | Concat with image dim | 32 | 514 | |||
25 | Dropout | 32 | 514 | |||
26 | Maxout (2-pool) | 32 | 512 | |||
27 | Dropout | 32 | 512 | |||
28 | Softmax | 32 | 5 |
(Where a//b
in the last column denotes pool or filter size a x a
with stride b x b
.)
which used the cyclic layers from the ≋ Deep Sea ≋ team. As nonlinearity I used the leaky rectify function, max(alpha*x, x)
, with alpha=0.3. Layers were almost always initialised with the SVD variant of the orthogonal initialisation (based on Saxe et al.). This gave me around 0.70 kappa. However, I quickly realised that, given the grading criteria for the different classes (think of the microaneurysms which are pretty much impossible to detect on 120x120 images), I would have to use bigger input images to get anywhere near a decent model.
Something else that I had already started testing in models somewhat, which seemed to be quite critical for decent performance, was oversampling the smaller classes. I.e., you make samples of certain classes more likely than others to be picked as input to your network. This resulted in more stable updates and better, quicker training in general (especially since I was using small batch sizes of 32 or 64 samples because of GPU memory restrictions).
First I wanted to take into account the fact that for each patient we get two retina images: the left and right eye. By combining the dense representations of the two eyes before the last two dense layers (one of which being a softmax layer) I could use both images to classify each image. Intuitively you can expect some pairs of labels to be more probable than others and since you always get two images per patient, this seems like a good thing to do.
This gave me the basic architecture for 512x512 rescaled input which was used pretty much until the end (except for some experiments):
Nr | Name | batch | channels | width | height | filter/pool |
---|---|---|---|---|---|---|
0 | Input | 64 | 3 | 512 | 512 | |
1 | Conv | 64 | 32 | 256 | 256 | 7//2 |
2 | Max pool | 64 | 32 | 127 | 127 | 3//2 |
3 | Conv | 64 | 32 | 127 | 127 | 3//1 |
4 | Conv | 64 | 32 | 127 | 127 | 3//1 |
5 | Max pool | 64 | 32 | 63 | 63 | 3//2 |
6 | Conv | 64 | 64 | 63 | 63 | 3//1 |
7 | Conv | 64 | 64 | 63 | 63 | 3//1 |
8 | Max pool | 64 | 64 | 31 | 31 | 3//2 |
9 | Conv | 64 | 128 | 31 | 31 | 3//1 |
10 | Conv | 64 | 128 | 31 | 31 | 3//1 |
11 | Conv | 64 | 128 | 31 | 31 | 3//1 |
12 | Conv | 64 | 128 | 31 | 31 | 3//1 |
13 | Max pool | 64 | 128 | 15 | 15 | 3//2 |
14 | Conv | 64 | 256 | 15 | 15 | 3//1 |
15 | Conv | 64 | 256 | 15 | 14 | 3//1 |
16 | Conv | 64 | 256 | 15 | 15 | 3//1 |
17 | Conv | 64 | 256 | 15 | 15 | 3//1 |
18 | Max pool | 64 | 256 | 7 | 7 | 3//2 |
19 | Dropout | 64 | 256 | 7 | 7 | |
20 | Maxout (2-pool) | 64 | 512 | |||
21 | Concat with image dim | 64 | 514 | |||
22 | Reshape (merge eyes) | 32 | 1028 | |||
23 | Dropout | 32 | 1028 | |||
24 | Maxout (2-pool) | 32 | 512 | |||
25 | Dropout | 32 | 512 | |||
26 | Dense (linear) | 32 | 10 | |||
27 | Reshape (back to one eye) | 64 | 5 | |||
28 | Apply softmax | 64 | 5 |
(Where a//b
in the last column denotes pool or filter size a x a
with stride b x b
.)
Some things that had also been changed:
- Using higher leakiness on the leaky rectify units,
max(alpha*x, x)
, made a big difference on performance. I started using alpha=0.5 which worked very well. In the small tests I did, usingalpha=0.3
or lower gave significantly lower scores. - Instead of doing the initial downscale with a factor five before processing images, I only downscaled by a factor two. It is unlikely to make a big difference but I was able to handle it computationally so there was not much reason not to.
- The oversampling of smaller classes was now done with a resulting uniform distribution of the classes. But now it also switched back somewhere during the training to the original training set distribution. This was done because initially I noticed the distribution of the predicted classes to be quite different from the training set distribution. However, this is not necessarily because of the oversampling (although you would expect it to have a significant effect!) and it appeared to be mostly because of the specific kappa loss optimisation (which takes into account the distributions of the predictions and the ground truth). It is also much more prone to overfitting when training for a long time on some samples which are 10 times more likely than others.
- Maxout worked slightly better or at least as well as normal dense layers (but it had fewer parameters).
In our research, to tackle the aforementioned challenges, we built a predictive model for Computer-Aided Diagnosis (CAD), leveraging eye fundus images that are widely used in present-day hospitals, given that these images can be acquired at a relatively low cost. Additionally, based on our CAD model, we developed a novel tool for diabetic retinopathy diagnosis that takes the form of a prototype web application. The main contribution of this research stems from the novelty of our predictive model and its integration into a prototype web application.
First start the flask app
python app.py
-
Take the retinal image of person one per each eye
-
We have created a REST API which takes two images as input and return JSON response
-
You can also generate PDF which contain images you upload and their predictions for doctors can refer it for later use
The final model we used was in the Model Folder. Also, we havetried various approach to get the good results and all the approaches are in Miscellaneous Folder.
This project cannot be completed without you guys Parth Purani @github/ParthPurani and Hardik Vekariya @github/hv245. Thanks for your support :)