/cancer-detection-transfer-learning

Use Keras and Transfer Learning to detect Skin Cancer by diagnosing Melanoma

Primary LanguageJupyter Notebook

cancer-detection-transfer-learning

Use Keras and Transfer Learning to detect Skin Cancer by diagnosing Melanoma

Dermatologist AI

Introduction

The data is pulled from the 2017 ISIC Challenge on Skin Lesion Analysis Towards Melanoma Detection.

Getting Started

  1. Clone the repository and create a data/ folder to hold the dataset of skin images.
git clone https://github.com/udacity/dermatologist-ai.git
mkdir data; cd data
  1. Create folders to hold the training, validation, and test images.
mkdir train; mkdir valid; mkdir test
  1. Download and unzip the training data (5.3 GB).

  2. Download and unzip the validation data (824.5 MB).

  3. Download and unzip the test data (5.1 GB).

  4. Place the training, validation, and test images in the data/ folder, at data/train/, data/valid/, and data/test/, respectively. Each folder should contain three sub-folders (melanoma/, nevus/, seborrheic_keratosis/), each containing representative images from one of the three image classes.

You are free to use any coding environment of your choice to solve this mini project! In order to rank your results, you need only use a pipeline that culminates in a CSV file containing your test predictions.

Create a Model

Use the training and validation data to train a model that can distinguish between the three different image classes. (After training, you will use the test images to gauge the performance of your model.)

If you would like to read more about some of the algorithms that were successful in this competition, please read this article that discusses some of the best approaches. A few of the corresponding research papers appear below.

While the original challenge provided additional data (such as the gender and age of the patients), we only provide the image data to you. If you would like to download this additional patient data, you may do so at the competition website.

All three of the above teams increased the number of images in the training set with additional data sources. If you'd like to expand your training set, you are encouraged to begin with the ISIC Archive.

Your file should have exactly 3 columns:

  • Id - the file names of the test images (in the same order as the sample submission file)
  • task_1 - the model's predicted probability that the image (at the path in Id) depicts melanoma
  • task_2 - the model's predicted probability that the image (at the path in Id) depicts seborrheic keratosis

Once the CSV file is obtained, you will use the get_results.py file to score your submission. To set up the environment to run this file, you need to create (and activate) an environment with Python 3.5 and a few pip-installable packages:

conda create --name derm-ai python=3.5
source activate derm-ai
pip install -r requirements.txt

Once you have set up the environment, run the following command to see how the sample submission performed:

python get_results.py sample_predictions.csv

Check the terminal output for the scores obtained in the three categories:

Category 1 Score: 0.526
Category 2 Score: 0.606
Category 3 Score: 0.566

The corresponding ROC curves appear in a pop-up window, along with the confusion matrix corresponding to melanoma classification.

The code for generating the confusion matrix assumes that the threshold for classifying melanoma is set to 0.5. To change this threshold, you need only supply an additional command-line argument when calling the get_results.py file. For instance, to set the threshold at 0.4, you need only run:

python get_results.py sample_predictions.csv 0.4