Convolutional Neural-Network for MNIST

An implementation of convolutional neural-network (CNN) for MNIST with various techniques such as data augmentation, dropout, batchnormalization, etc.

Network architecture

CNN with 4 layers has following architecture.

input layer : 784 nodes (MNIST images size)
first convolution layer : 5x5x32
first max-pooling layer
second convolution layer : 5x5x64
second max-pooling layer
third fully-connected layer : 1024 nodes
output layer : 10 nodes (number of class for MNIST)

Tools for improving CNN performance

The following techniques are employed to imporve performance of CNN.

Train

1. Data augmentation

The number of train-data is increased to 5 times by means of

Random rotation : each image is rotated by random degree in ranging [-15°, +15°].
Random shift : each image is randomly shifted by a value ranging [-2pix, +2pix] at both axises.
Zero-centered normalization : a pixel value is subtracted by (PIXEL_DEPTH/2) and divided by PIXEL_DEPTH.

2. Parameter initializers

Weight initializer : xaiver initializer
Bias initializer : constant (zero) initializer

3. Batch normalization

All convolution/fully-connected layers use batch normalization.

4. Dropout

The third fully-connected layer employes dropout technique.

5. Exponentially decayed learning rate

A learning rate is decayed every after one-epoch.

Test

1. Ensemble prediction

Every model makes a prediction (votes) for each test instance and the final output prediction is the one that receives the highest number of votes.

Usage

Train

python mnist_cnn_train.py

Training logs are saved in "logs/train". Trained model is saved as "model/model.ckpt".

Test a single model

python mnist_cnn_test.py --model-dir <model_directory> --batch-size <batch_size> --use-ensemble False

<model_directory> is the location where a model to be testes is saved. Please do not specify filename of "model.ckpt".
<batch_size> is employed to reduce burden of memory of machine. The number of test data is 10,000 for MNIST. Different batch_size gives the same result, but requiring different memory size.

You may command like python mnist_cnn_test.py --model-dir model/model01_99.61 --batch-size 5000 --use-ensemble False.

Test ensemble prediction

python mnist_cnn_test.py --model-dir <model_directory> --batch-size <batch_size> --use-ensemble True

<model_directory> is the location of root directory. The root directory contains the sub-directories containg each model.

You may command like python mnist_cnn_test.py --model-dir model --batch-size 5000 --use-ensemble True.

Simulation results

CNN with the same hyper-parameters has been trained 30 times, and gives the following results.

A single model : 99.61% of accuracy.
(the model is saved in "model/model01_99.61".)
Ensemble prediction : 99.72% of accuracy.
(All 5 models under "model/" are used. I found the collection of 5 models by try and error.)

99.72% of accuracy is the 5th rank according to Here.

Acknowledgement

This implementation has been tested on Tensorflow r0.12.

hwalsuklee/tensorflow-mnist-cnn

Convolutional Neural-Network for MNIST

Network architecture

Tools for improving CNN performance

Train

1. Data augmentation

2. Parameter initializers

3. Batch normalization

4. Dropout

5. Exponentially decayed learning rate

Test

1. Ensemble prediction

Usage

Train

Test a single model

Test ensemble prediction

Simulation results

Acknowledgement