Model

This is an implementation of the deep residual network used for cifar10 as described in He et. al., "Deep Residual Learning for Image Recognition". The model is structured as a very deep network with skip connections designed to have convolutional parameters adjusting to residual activations. The training protocol uses minimal pre-processing (mean subtraction) and very simple data augmentation (shuffling, flipping, and cropping). All model parameters (even batch norm parameters) are updated using simple stochastic gradient descent with weight decay. The learning rate is dropped only twice (at 90 and 135 epochs).

Acknowledgments

Many thanks to Dr. He and his team at MSRA for their helpful input in replicating the model as described in their paper.

Model script

The model train script is included (cifar10_msra.py).

Trained weights

The trained weights file can be downloaded from AWS

Model Depth	Model File
20	cifar10_msra_020_e180.p
32	cifar10_msra_032_e180.p
56	cifar10_msra_056_e180.p
110	cifar10_msra_110_e180.p

Performance

Training this model with the options described below should be able to achieve above 93.6% top-1 accuracy using only mean subtraction, random cropping, and random flips.

Instructions

This script was tested with neon version 1.2.1. Make sure that your local repo is synced to this commit and run the installation procedure before proceeding. Commit SHA for v1.2.1 is c460e6c12cc4ea6e7453c0335afadf1f5110a4f7

In addition, we use the branch that implements the merge sum layer type.

This example uses the ImageLoader module to load the images for consumption while applying random cropping, flipping, and shuffling. Prior to beginning training, you need to write out the padded cifar10 images into a macrobatch repository. From your top-level neon direcotry, run:

neon/data/batch_writer.py \
    --set_type cifar10 \
    --data_dir <path-to-save-batches> \
    --macro_size 10000 \
    --target_size 40

Note that it is good practice to choose your data_dir to be local to your machine in order to avoid having ImageLoader module perform reads over the network.

Once the batches have been written out, you may initiate training:

cifar10_msra.py -r 0 -vv \
    --log <logfile> \
    --epochs 180 \
    --save_path <model-save-path> \
    --eval_freq 1 \
    --backend gpu \
    --data_dir <path-to-saved-batches> \
    --depth <n>

The depth argument is the n value discussed in the paper which represents the number of repeated residual models at each filter depth. Since there are 3 stages at each filter depth, and each residual module consists of 2 convolutional layers, there will be 6n total convolutional layers in the residual part of the network, plus 2 additional layers (input convolutional, and output linear), making the total network 6n+2 layers deep. For depth arguments of 3, 5, 9, 18, we get network depths of 20, 32, 56, and 110.

If you just want to run evaluation, you can use the much simpler script that loads the serialized model and evaluates it on the validation set:

cifar10_eval.py -vv --model_file <model-save-path>

Benchmarks

Machine and GPU specs:

Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Ubuntu 14.04.2 LTS
GPU: GeForce GTX TITAN X
CUDA Driver Version 7.0

The memory usage and per-epoch training time of each network configuration, along with final validation error is shown in the table below. We observed that the error rates were consistently lower than what was cited in the original paper. Our hypothesis is that this may be due to our inclusion of a final batch norm transformation at the output affine layer.

Model Depth	GPU Memory Footprint	Seconds per Epoch	Validation Error %
20	521 MiB	11	8.29
32	636 MiB	18	7.26
56	860 MiB	30	6.31
110	1277 MiB	60	6.00

The total amount of time to train the 56 layer network for 180 epochs was about 90 minutes with the described machine and GPU specifications.

The evolution of validation misclassification error for the various layer depths can be seen in the figures below.