/small-image-classifier

An image classifier for classifying an image into 3 categories (dog, cat or human).

Primary LanguagePython

Deep Learning based Image Classifier

Dataset Overview

The dataset used is a subset of the Imagenet dataset. It contains images for 3 classes - dog, cat, human. The number of images per class are as follows:

  • Class-1 (Dog): 894
  • Class-2 (Cat): 1132
  • Class-3 (Human): 874

Total number of images are 2900.

I have used some data augmentation techniques to increase the size of the dataset. They are as follows:

  • Flip the image about the vertical axis.
  • Add gaussian noise.
  • Increase the contrast of the image.
  • Crop the image by 10% from all 4 edges.
  • Rotate the image by 20-degrees.

The new dataset has the following number of images:

  • Class-1 (Dog): 5364
  • Class-2 (Cat): 6792
  • Class-3 (Human): 5244

Total number of images are 17400.

The number of images required to build a robust classifier is a function of the complexity of the model we're trying to train. The more complex the model (i.e., the more parameters it learns), the more data it requires. For example, a simple classifier like SVM will require less data to perform good enough as comapred to a multi-layered CNN.

Models

I have experimented with the following models:

  1. Basic CNN: This model consists of the following layers:

    • Layer 1: A convolution layer with kernel size: 5 x 5 x 32.
    • Layer 2: A max-pooling layer with downscale factor of 2.
    • Layer 3: A convolution layer with kernel size: 5 x 5 x 64.
    • Layer 4: A max-pooling layer with downscale factor of 2.
    • Layer 5: A dense layer with 1024 hidden units.
    • Layer 6: Soft-max layer with 3 nodes (i.e. the number of classes).
  2. Wide Residual Network: The exact details of this architecture are described in the paper (Link).

Results

Image size used: 32 x 32 x 3

  • Model: Basic CNN
    Batch size: 100
    Learning rate: 0.01
    Iterations: 20,000
    Drop-out probability: 0.5 (training), 1.0 (testing)
    Test accuracy: 80%

  • Model: Wide Residual Network

S.R. No. N/w Width (k) Units per block (n) Learning Rate Batch Size Iterations Dropout Prob Test Acc (%)
1 1 2 0.01 100 50,000 0.5 90
2 1 3 0.01 100 50,000 0.5 93
3 2 2 0.01 100 50,000 0.5 94
4 2 3 0.01 100 50,000 0.5 93.8
5 2 2 0.01 100 50,000 0.3 94.2
6 2 2 0.01 200 50,000 0.3 94.8
7 2 3 0.01 200 50,000 0.3 95.36
8 3 3 0.01 200 100,000 0.3 95.90

Further Improvements

We can improve the performance further by using transfer learning in the following 2 ways -

  • Cut-off the final layer of a model pre-trainned on similar kind of classes. Attach a new softmax layer for our 3 classes and start training this new model on our data. This is called fine-tuning.

  • Pass the images through a pre-trainned model and extract features from the second-last layer. Use these features with some other classifier like SVM or xgboost (boosting with decision trees). In case of xgboost we will have to train 3 one-vs-all classifiers and take a weighted combination of their outputs at the time of testing.

With both the above methods we can train a very good performance classifier using relatively less amount of data and much faster as well.

We can also experiment with ensemble networks to see if the performance can be improved.