/skin_mnist

Analysis of Skin Cancer MNIST: HAM10000

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Skin Cancer MNIST: HAM10000

The Skin MNIST dataset is a classification problem composed of dermatological images allusive to different types of skin cancer.
Unlike other datasets, this one promotes the study of several types of skin lesions, thus allowing a less generalist diagnosis, and allows a more incisive study on the several types of skin lesions that a patient might suffer.

Data

This benchmark consists of 10015 images that are the result of an intensive study developed by various entities. The samples are represented in RGB format and have dimensions 600*450 (length and width respectively).
This benchmark promotes the study of seven different types of skin lesions:

  • Actinic Keratoses;
  • Basal cell carcinoma;
  • Benign Keratosis;
  • Dermatofibroma;
  • Melanocytic nevi;
  • Melanoma;
  • Vascular skin lesions;

Limitations of this dataset

The main limitations of this benchmark are:

  • High unbalanced classes (Sample distributions between classes are very disproportional);
  • Small number of samples;
  • Problem with high complexity;
  • Samples with high dimensions;

What this project offers

  • Disponibilization of a Jupyter notebook with problem pre-analysis;
  • Several techniques are applied to reduce the main limitations of the problem, such as: Random Oversampling, Cost-Sensitive-Learning and Data Augmentation;
  • It implements and uses four convolutional architectures for the consequent resolution of the problem: AlexNet, VGGNet, ResNet and DenseNet;
  • Use of PSO algorithm to optimize the structure and other hyperparameters of different convolutional architectures;
  • Application of the ensemble technique to improve the performance obtained, individually, by the architectures (combining the probabilistic distributions of the different architectures - average);

Results

The table represented below includes the results related to the optimization of each architecture, and the user can download the consequent model obtained.

Model Memory Macro Average F1Score Macro Average Recall Accuracy File
AlexNet 7,8 MB 65.4% 63.5% 81.1% AlexNet h5 File
VGGNet 12,9 MB 64.8% 62.3% 80.8% VGGNet h5 File
ResNet 39,8 MB 66.5% 64.2% 81.3% ResNet h5 File
DenseNet 4,4 MB 67.6% 65.4% 81.6% DenseNet h5 File
Ensemble Average All Models 21,8 MB 68.5% 65.2% 83.0% Ensemble All Models h5 File
Ensemble Average Alex + VGG + Dense 17,5 MB 69.2% 66.5% 83.1% Ensemble Best Combination h5 File

How can I use it

  1. Clone Project: git clone https://github.com/bundasmanu/skin_mnist.git
  2. Install requirements: pip install -r requirements.txt
  3. Check config.py file, and redraw the configuration variables used to read, obtain and divide the data of the problem, and variables that are used for construction, training and optimization of the architectures.
    • Samples of problem are readed from ../input/images/*.jpg folder --> this is an example that you need to pay attention and redraw before use project;

Data Access

https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000

Licence

GPL-3.0 License
I am open to new ideas and improvements to the current repository. However, until the defense of my master thesis, I will not accept pull request's.