/Bag_of_Tricks_for_Image_Classification_with_Convolutional_Neural_Networks

experiments on Paper <Bag of Tricks for Image Classification with Convolutional Neural Networks> and other useful tricks to improve CNN acc

Primary LanguagePython

Bag of Tricks for Image Classification with Convolutional Neural Networks

This repo was inspired by Paper Bag of Tricks for Image Classification with Convolutional Neural Networks

I would test popular training tricks as many as I can for improving image classification accuarcy, feel free to leave a comment about the tricks you want me to test(please write the referenced paper along with the tricks)

hardware

Using 4 Tesla P40 to run the experiments

dataset

I will use CUB_200_2011 dataset instead of ImageNet, just for simplicity, this is a fine-grained image classification dataset, which contains 200 birds categlories, 5K+ training images, and 5K+ test images.The state of the art acc on vgg16 is around 73%(please correct me if I was wrong).You could easily change it to the ones you like: Stanford Dogs, Stanford Cars. Or even ImageNet.

network

Use a VGG16 network to test my tricks, also for simplicity reasons, since VGG16 is easy to implement. I'm considering switch to AlexNet, to see how powerful these tricks are.

tricks

tricks I've tested, some of them were from the Paper Bag of Tricks for Image Classification with Convolutional Neural Networks :

trick referenced paper
xavier init Understanding the difficulty of training deep feedforward neural networks
warmup training Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
no bias decay Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes
label smoothing Rethinking the inception architecture for computer vision)
random erasing Random Erasing Data Augmentation
cutout Improved Regularization of Convolutional Neural Networks with Cutout
linear scaling learning rate Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
cosine learning rate decay SGDR: Stochastic Gradient Descent with Warm Restarts

and more to come......

result

baseline(training from sctrach, no ImageNet pretrain weights are used):

vgg16 64.60% on CUB_200_2011 dataset, lr=0.01, batchsize=64

effects of stacking tricks

trick acc
baseline 64.60%
+xavier init and warmup training 66.07%
+no bias decay 70.14%
+label smoothing 71.20%
+random erasing does not work, drops about 4 points
+linear scaling learning rate(batchsize 256, lr 0.04) 71.21%
+cutout does not work, drops about 1 point
+cosine learning rate decay does not work, drops about 1 point