Very-Deep-Convolutional-Networks-for-Large-Scale-Image-Recognition

This project work is a implementation of Very Deep Convolutional Networks for Large-Scale Image Recognition in Pytorch. However, the dataset used for this project is from Tiny ImageNet Visual Recognition Challenge. The model performance can be improved by using smaller convolutional filter (i.e feature extraction maps) with 3x3 size while increasing network layer depth. In this model, an adaptive maxpooling layer was used between the final ConvNet layer and fully connected layer. Furthermore, this model doesn't use any out dropout layers.

Model Architecture

The models were implemented as per original paper description, which can be seen in the following figure.

Dataset

The dataset is devided in to 3 sets:

  • Trainset is composed with 200 clasess and each class has 200 samples with a 64x64 pixels
  • Validationset has composed with a 50 samples per each class.
  • Testset is a total size 10000 samples.

Sample images from trainset:

trainsample

Sample images from validationset:

valsample

Computing resources

The models were trained on GPU, which is having a configuration of Nvidia-GTX 1070 8gb. Each model has took approx. 14h to 20h for training.

Results

Model performance plots:

11 layer model 13 layer model 16 layer model 16 C1x1 layer model 19 layer model

Accuracy:

layers Vs Dataset Trainset Valset Testset
11 layers 0.4800 0.3591 -
13 layers 0.5133 0.4001 -
16 layers 0.48322 0.3692 -
16 layers, Cf:1x1 0.4932 0.3845 -
19 layers 0.4890 0.3563 -

Loss:

layers Vs Dataset Trainset Valset Testset
11 layers 2.03 2.75 -
13 layers 1.85 2.57 -
16 layers 1.95 2.65 -
16 layers, Cf:1x1 1.91 2.63 -
19 layers 1.92 2.79 -

Sample testset images with its predicted class:

testsample

Prediction results were available at test predictions file. Columns names 11,13,16,16_C11 and 19 were the predictions made by the each model. However, the predictions were different for different models. Hence, prediction for each test image was decided based on the highest frequency of a predicted class by each model. Furthermore, the models made an unique prediction for some of the test images. In this case, prediction was considered from 11 layer model.

Usage

Very Deep Convolutional Networks for Large Scale Image Recognition

[-h] [-t TRAIN] [-v VAL] [-b {64,128,256}] [-e {50,100,150}]
[-d {11,13,16,19}] [-c11] [-es EARLY_STOPPING] [-i IMAGESIZE] [-lr LR]

optional arguments:

  -h, --help            show this help message and exit
  -t TRAIN, --train TRAIN
                        required image dataset for training a model. It must
                        be in the data directory
  -v VAL, --val VAL     required image dataset for training a model. It must
                        be in the data directory
  -b {64,128,256}, --batchsize {64,128,256}
                        select number of samples to load from dataset
  -e {50,100,150}, --epochs {50,100,150}
  -d {11,13,16,19}, --depth {11,13,16,19}
                        depth of the deep learning model
  -c11, --conv1_1       setting it True will replace some of the 3x3 Conv
                        layers with 1x1 Conv layers in the 16 layer network
  -es EARLY_STOPPING, --early_stopping EARLY_STOPPING
                        early stopping is used to stop training of network, if
                        does not improve validation loss
  -i IMAGESIZE, --imagesize IMAGESIZE
                        it is used to resize the image pixels
  -lr LR, --lr LR       learning rate for an Adam optimizer

Example usage:

For training a model having a layer depth of 11:

python train.py -d 11 -e 50 -es 5 -b 50 

For training a model having a layer depth of 16 along with smaller Conv filter 1x1:

python train.py -d 16 -c11 -e 50 -es 5 -b 50