/COVID-Chest-X-Ray

Ensemble Model for the classification of X-Ray Images into disease classes of COVID, Pneumonia and Normal using Tensorflow 2.0

Primary LanguageJupyter Notebook

CHEST X-RAY CLASSIFIER

PNEUMONIA | COVID-19 | NORMAL

Contributors

  • Govind Jeevan ( @govindjeevan )
  • Palak Singhal ( @smarty1palak )

OVERVIEW

DATASET & ISSUES

Dataset Issue Effect on Model Solution
Duplication of Images \ Repetition of slightly modified versions of same image Data Leakage b/w Training and Test split and Over-optimistic results Deduplication, Forcing all the duplicates in one side of the split to avoid leakage
Highly Imbalanced Dataset Over-fitting on Minority class Augmentation of minority class, over sampling

PREPROCESSING

Tasks Performed in Order

  1. Conversion to Image Files
  2. Removal of exact replicas
  3. Moving near-exact replicas to the same side of train-test-split
  4. Augmentation of Minority Class Images in each split separately
    Flip: Not performed to avoid disturbing the natural orientation of a human chest.
    Tasks performed:
    1. Horizontal shift
    2. Vertical shift
    3. Random brightness
    4. Random zoom
  5. Image Resize (512x512 -> 300x300)
  6. Creation of npy

Preprocessed Dataset

Train Split Validation Split Test Split Total
Corona (0) 630 280 280 1190
Normal (1) 750 160 162 1072
Pneumonia (2) 753 161 162 1076
Total 2233 601 601 3338

EXPERIMENT RESULTS

The best-achieved accuracy by us is 98% and the model is available as corona-ensemble.h5py The notebook with code: Ensemble-Final Model.ipynb

NOVELTY

We performed transfer learning by using pre-trained feature extractors of state-of-the-art models of VGG16, ResNet and DenseNet169. A custom classifier was attached to each of these models, and were trained independently by us to perform disease classification on the given dataset. The trained models were then ensembled to form a 3 model architecture, in which the output layers of the models were concatenated and fed to a regression layer, which was then trained on the full dataset once again.
The regression layer learns to combine the results of the three models and chooses the most trusted outcome from a set of three results.

VISUALIZATIONS

I. GradCam Visualization

Notebook: Gradcam-7.ipynb

VGG-16

DenseNet169

ResNet

II. ROC Curve

This is the ROC curve generated by one vs all method where it’s treated like a binary classification problem where one class is considered as positive and the other classes are considered as negative.

Notebook: ROC-10.ipynb

HOW TO TEST THE MODEL

For testing purposes we suggest creating npy files of images which are of shape 300 by 300 . The steps to create npy from images are specified in section 2.7.

Please save the images as testx.npy and testy.npy and then load them using np.load () function and then put the file name of the test images and labels npy files.

The below notebook can be used for testing purposes where load_model should contain the .h5 file of the model. Test-11.ipynb
Please specify the name of the model to be tested in the notebook.

DETAILED PROCESS

Preprocessing FlowChart

Dataset Conversion (To Images)

A python script is written to convert .npy files to images in order to see the dataset as well as to train the model. The images are stored in a folder called Images with three subdirectories 0,1,2 containing images with corresponding labels.

The notebook for the same is shared by the name : Images from npy-1.ipynb

Identification of Exact and Near Duplicates & Deduplication

The dataset labeled 0 ( corona case had many duplicates as well as contained augmented data, hence reducing the number of original samples).

This is the list of duplicates encountered in the dataset using our script based on similarity index between the images.

We encountered 90 images which had duplicates in the corona dataset. The duplicates included the exact same images or slightly edited versions of the same image.

As mentioned above the class with label 0 ( corona cases) had duplicate as well as augmented images hence it would have affected the model’s performance, since when the data is divided into test/train and Val there will be repeated cases and hence instead of learning the features the model would learn the images itself and hence highly overfit.

In order to prevent this from happening we used a tool called imagedup which runs through the entire directory of images and finds similarity between the images and returns a dictionary of images with values being the images with high similarity. This is not very robust since it sometimes predicts two quite different images also as similar still we proceeded with it and manually divided data into tests ( containing all those images which were similar to some of the other image in the dataset) and the remaining images were divided into test and val.

The other classes when checked didn’t show any sign of duplication, hence they were left untouched.
The script: edup-2.ipynb ( image dir is the Images directory with 0 class sample-170 images)

Data Augmentation

As mentioned above the dataset is highly imbalanced with the corona class having only 170 samples as opposed to 1k samples for the other two classes. Hence we augmented the dataset to increase the number of samples for this class.

In order to augment we followed a two-step process:

a. Parameter finetuning:

The augmentation was performed on the following grounds:

  1. Horizontal shift

  2. Vertical shift

  3. Random brightness

  4. Random zoom

We didn’t consider parameters like horizontal flip, vertical flip, etc because if we horizontally flip the images the body structure itself might change and would result in some other disease, hence a misclassification and poisoning of the dataset. All the parameters were fine tuned by trial and error and by plotting the corresponding images.

b. Augmented Dataset Creation

In order to create the augmented dataset, we had to do the augmentation in class with label 0 ( corona cases). We separately augmented the train/test/val dataset in order to prevent any kind of duplication. The directories Train-dup, test-dup, and val-dup contain the segmented data after deduplication step above for class 0. Each of them is separately augmented.

The resulting images are stored in Augmented-Test/0/, Augmented-Train/0/, Augmented-Val/0/
The notebook for this is attached by the name : Aug-dup-3.ipynb

Image Resizing

Since the images were shaped 512 by 512, in order to reduce the complexity of computation without affecting the performance we resize the images to 300 by 300.

  • Resizing the augmented dataset: Resize-Aug-4.ipynb

  • Resizing the other dataset: Resize-other-4.ipynb

Split Folders

A tool called split folders is used to split the dataset into train test and validation. We decided upon the ratio of .7 for test .15 for validation and .15 for the test. This is done for classes with labels 1 and 2. For class 0 we had manually splitted into 3 segments and then augmented each and those augmented images are respectively put in the corresponding type ( train, test, val) in a directory called output.

Creating npy for test/train and validation data

In order to evaluate and test the model, we create npy on train, test, and validation data which was created after following the above steps.

It is stored as tranx.npy, trany.npy, testx.npy, testy.npy, valx.npy, valy.npy

Notebook: Img-to-npy-5.ipynb

Training Experiments

I. Custom CNN

A simple CNN with 3 convolutional layers followed by a flatten and then two dense layers was trained for 50 epochs on the given dataset and we could only achieve 82% accuracy. We used max pooling layers in between to reduce computational cost without leveraging performance. The model architecture is shown in the image

II. DenseNet169 Feature Extractor + Custom Classifier ( Transfer Learning )

We tried using transfer learning using the model DenseNet169 and used the pre-trained weights trained on ‘imagenet’

We added a pooling layer, and dense layers and trained them for around 20 epochs. We have used adam optimizer after trying with other optimizers like rmsprop etc.

We could achieve 95% accuracy after training for around 20 epochs.

III.VGG16 Feature Extractor + Custom Classifier ( Transfer Learning )

We tried another transfer learning method with VGG16 as the base model and using imagenet weights and keeping the base model non-trainable. We didn’t include the top layers and added our own layers and trained them. We added a pooling layer and dense layers and then trained the model for around 20 epochs. After fine-tuning, we decided upon Adam as the optimizer and the learning rate as 0.001. The results achieved on it are pretty good. We were able to achieve an accuracy of around 96.2%.

IV. ResNet Feature Extractor + Custom Classifier ( Transfer Learning )

We tried another transfer learning method with resnet as the base model and using imagenet weights and keeping the base model non-trainable. We didn’t include the top layers and added our own layers and trained them. We added a pooling layer and dense layers and then trained the model for around 20 epochs. After fine-tuning, we decided upon Adam as the optimizer and the learning rate as 0.001.

We added a pooling layer, and dense layers and trained them for around 20 epochs. We have used adam optimizer after trying with other optimizers like rmsprop etc.

We could achieve 98% accuracy after training for around 20 epochs.

V. Custom Ensemble Model ( ResNet + DenseNet169 + VGG16 + Regression Layer )

We tried an ensemble model by combining all the above three models and trained it . The models used for ensembling are VGG16 , DenseNet169 and ResNet and each of them was first trained on our dataset and fine tuned. The ensemble model is then trained to choose the best out of the given three model and the given three classes.

Observations

We made a couple of observations with respect to the given dataset and the different models we tried in order to make the model learn:

  1. The dataset encountered in the real world is dirty. Data cleansing is a must before proceeding to train the model. In our case itself, there were lots of duplicates present in the dataset as well as augmented images which were basically poisoning the dataset. If one trains on the dataset without following any procedure to remove any duplicates or augmented images the model will overfit and instead of learning features from images it will learn the images itself and will end up performing really bad when tested on unseen patient’s image. Hence in order to create a robust model, we need to carefully observe the data and then get rid of or carefully arrange it so that the train/test and validation set are completely disjointed and we have a real vision on how the model is performing.

  2. The augmentation parameter to be used while augmenting needs to be carefully examined and the data should be clearly understood.

As we can see here in this, if we do a horizontal flip in order to augment, it changes the disease itself while keeping the label the same. This is definitely not what we want. We are poisoning the data by supplying it with the wrong samples. Hence augmentation parameters need to clearly understood and applied based on the dataset. In our case, we only used parameters like horizontal shift, vertical shift, brightness, and zoom.

  1. Augmentation on the dataset should be done after separating out the data into train test and validation so as to not repeat any kind of image in the dataset and keep them mutually exclusive. This prevents and reduces overfitting and also the model is prevented from learning images instead of features of images.

  2. Transfer learning with models pretrained on imagenet dataset performed much better than the custom models since they are already trained to identify corners, images, and that too on millions of images. Hence using custom models doesn’t make sense. However, we trained and fine tuned the top layers based on our dataset.