/Brain_cancer_classification

Brain cancer classification with 98.2% accuracy

Primary LanguagePython

Brain_cancer_classification

The goal of the project was the classification of tumors based on MRI images.

There were 4 classes: 'glioma_tumor', 'meningioma_tumor', 'no_tumor' and 'pituitary_tumor'.

There were given the train and test datasets, both containing images belonging to the 4 classes. I fetched the data from here https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification-mri.

In this repository there are 2 notebooks obtained with Jupyter Notebook and 2 Python scripts produced with Spyder:

  • EfficientNetB2.py) that I used to fit an EfficientNetB2 model to the train dataset and predict on test dataset.
  • EfficientNetB6.py in which I trained an EfficientNetB6 model and I predicted the model on test dataset.
  • EfficientNetB3.ipynb that is the model with the lowest accuracy. I applied 2 addictional hidden layers on top of a pre-trained EfficientNetB6 model , and represented some images in the dataset and images with their respective predicted class. Moreover, I calculated some metrics (F1 score,precision,recall and accuracy) and I represented a confusion matrix.
  • CNN_with_convolutional_layers.ipynb in which I fit a Convolutional Neural Network with an augmented train dataset and I predicted the model on test dataset. Furthermore, I represented the convolutional layers used to build the model visualizing the application of convolutional filters to a randomly picked image.

There are also the folder plots containing all the saved plots, data that contains the train and test datasets and models in which are the saved models (unfortunately, I could only load one model because the size of the saved models is way too big for the repository). Below, I reported the training curves represented for the notebook with the highest accuracy and lowest loss.

The model reached 98.16% accuracy with a loss of 0.055.

Here is a table with other metrics:

 

precision recall f1-score support
glioma_tumor 0.99 0.97 0.98
meningioma_tumor 0.97 0.99 0.98
no_tumor 0.99 0.99 0.99
pituitary_tumor 0.99 0.99 0.99

 

An important metric is the precision (ratio between true positive and true positive plus false positive) calculated for the no_tumor class.

It's crucial because when the number of false positive (people that have a tumor, but the prediction belongs to no_tumor class) is high, it means that a lot of people with a tumor are classified with no tumor according to the model.

In this case, only one image out of 75 is a false positive as we can see in the confusion matrix represented below using Seaborn and Matplotlib.

Moreover, I plotted some images in the datasets and the respective predicted class.

 

80ValidationDatasetimages_and_predictedclass_white

Below, instead, I plotted the first, an intermediate and the last convolutional layers related to this notebook.

 

 

Clicking on the images, it can be noticed that the deeper we go, the less specific are the filters and the image is reduced in size, too.