Melanoma Classification

This repository contains code to create web application which use to detect melanome from given skin image.

You can try this app using : http://3.17.64.68:8501/

Introduction

Skin cancer is the most prevalent type of cancer. Melanoma, specifically, is responsible for 75% of skin cancer deaths, despite being the least common skin cancer. The American Cancer Society estimates over 100,000 new melanoma cases will be diagnosed in 2020. It's also expected that almost 7,000 people will die from the disease. As with other cancers, early and accurate detection—potentially aided by data science—can make treatment more effective. https://www.kaggle.com/c/siim-isic-melanoma-classification.

Objective

The objective of this project is to identify melanoma in images of skin lesions. Using patient-level contextual information may help the development of image analysis tools, which could better support clinical dermatologists.In particular, we need to use images within the same patient and determine which are likely to represent a melanoma. In other words, we need to create a model which should predict the probability whether the lesion in the image is malignantor benign.Value 0 denotes benign, and 1 indicates malignant.

DataSet

The dataset which we are going to use are from following sources:

Kaggle SIIM Melanoma Classification Challange : https://www.kaggle.com/c/siim-isic-melanoma-classification

The dataset consists of images in :

  • DIOCOM format
  • JPEG format in JPEG directory
  • TFRecord format in tfrecords directory

Additionally, there is a metadata comprising of train, test and submission file in CSV format.

Exploritory Data Analysis :

The complete EDA of this dataset is available here.

Model Used :

In this project we used ResNeXt50 which is pretrained on Imagenet.

Training Process:

For training we resized all the images into 224X224.

To convert all images into this fromat script is avaialable here.

We used 10 fold StratifiedKfold and created new file which has KFlods. The script is avialable here

We used train.py to train this model on our dataset.

Web App :

Streamlit folder contains python script named app.py with a Streamlit app built around the model trained. and prediction.py contains predict function which takes an image and returns prediction.

Hyperparameters

You can experiment with following hyperparametes to see different results:

resize_images.py : image size

create_folds.py : No of Folds

train.py :

  • Used Model
  • Augmentations
  • Learning Rate
  • Optimizer
  • Use of Metadata

Software Used :

Python 3.7.6
cuda version 10.2.89
cuddn 7.6.5

python packages are detailed separately in requirements.txt or environment.yml.

System Steup:

You can install all necessary files using

pip install -r requirements.txt

or if you are using conda you can create virtual environment named pytorch which has all required libraries:

conda env create -f environment.yml

Data Setup:

Assumes that Kaggle Api is installed.

cd data
kaggle competitions download -c siim-isic-melanoma-classification

Training

python resize_images.py
python create_folds.py
python train.py

Web App to use locally :

cd Streamlit
streamlit run app.py

Evaluation of Model Using AUC:

For this particluar problem, we will be evaluate model using area under the ROC curve. An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters:

An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives. The following figure shows a typical ROC curve.

source: https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc

Achieved AUC Score :

AUC : 0.8990

This means for a given image our model is 89.9% sure about its prediction whether it is postive or negative.