/Human-Protein-Classification

Solution for Kaggle human protein classification competition.

Primary LanguageJupyter Notebook

Human-Protein-Classification

This is my submission of Kaggle competition Zero to GANs - Human Protein Classification.

Description

In this competition, I have developed a model capable of classifying mixed patterns of proteins in microscope images. Images visualizing proteins in cells are commonly used for biomedical research, and these cells could hold the key for the next breakthrough in medicine. However, thanks to advances in high-throughput microscopy, these images are generated at a far greater pace than what can be manually evaluated. Therefore, the need is greater than ever for automating biomedical image analysis to accelerate the understanding of human cells and disease.

This is a multilabel image classification problem, where each image can belong to several classes. The class labels are as follows:

  1. 'Mitochondria',
  2. 'Nuclear bodies',
  3. 'Nucleoli',
  4. 'Golgi apparatus',
  5. 'Nucleoplasm',
  6. 'Nucleoli fibrillar center',
  7. 'Cytosol',
  8. 'Plasma membrane',
  9. 'Centrosome',
  10. 'Nuclear speckles'

Dataset

Dataset contains 512x512 resolution images of 3 channels (RGB). Images are in PNG format. Train set contains - 19236 images Test set contains - 8243 images.

Prediction

We are predicting protein organelle localization labels for each sample. There are in total 10 different labels present in the dataset. The dataset is acquired in a highly standardized way using one imaging modality (confocal microscopy). However, the dataset comprises 10 different cell types of highly different morphology, which affect the protein patterns of the different organelles. Each image can have 1 or more labels associated to them.