Train CIFAR10

This project explores the performance of some classic architectures and training methods of convolutional neural networks on CIFAR10.

Usage

Installation
Running Examples

Experiments

Training Different Network Architecture
Deeper or Wider?
Activation Function and Layer Normalization
Augmentations

Usage

Installation

step 1. install PyTorch

step 2. install denpendencies

pip install -r requirements [-i https://pypi.tuna.tsinghua.edu.cn/simple]

Running Examples

training the resnet18 with the cosine annealing scheduler

python train.py --name <savedir> --arch resnet18 --sche cosine_annealing --lr 0.1 -e 250

training the desnet121 with multi-step scheduler and dropping the learning rate at epoch 20, 40, 60, 80.

python train.py --name <savedir> --arch densenet121 --lr 0.1 -m 20 -m 40 -m 60 -m 80

training the resnet18 with random erasing augmentation.

python train.py --name <savedir> --arch resnet18 --erase-aug

training the resnet18 with mixup augmentation.

python train.py --name <savedir> --arch resnet18 --mixup-aug --mixup-alpha 0.3

Experiments

Network Architectures

Find out which network architecture leads to better performance.

ResNet: ResNets is a classical and strong baseline in compute vision. ResNets introduce residual structure into convolutional neural network to solve the problem of gradient disappearance and increase the depth of CNN. The test results show that resnets are stable networks with good performance.

Residual Block
VGG: Compared with previous work, AlexNets, VGG nets stack convolutional layers and pooling layers with smaller kernel size, which reduces parameters and improves classification accuracy.
DenseNet: Similar to deep residual networks, dense networks make connections between different layers but are denser. As shown in the table below, dense networks achieve the highest accuracy on CIFAR10 at smaller sizes. Practically, dense networks take up more memory at inference because the outputs of all layers are saved for concatenation.
ResNeXt: ResNeXts use the aggregated residual layers to aggregated more convolution operation in one layer, thereby improving classification accuracy.
ConvMixer: The ConvMixer was proposed for a question: Is the performance of ViTs due to the inherently-more-powerful Transformer architecture, or is it at least partly due to using patches as the input representation? ConvMixers start with a patches embeddings layer followed by several DW layers. The source code to train ConvMixer has achieved over 95% accuracy, but it seems to require more tricks as it is only 93% accurate in our training configuration.

Model	params(M)	Train Acc	Valid Acc
resnet18	11.17	98.957	96.015
vgg16	14.72	98.661	95.214
densenet121	6.95	99.257	96.301
resnext18	10.02	99.025	95.827
convmixer256d16	1.28	98.649	93.701

Deeper or Wider?

Intuitively, it is easy to improve the performance of deep neural networks by increasing the number of parameters. Zagoruyko, S .et.al shows that the way to increase model parameters is not only to deepen the network, but also to widen the network. We compare the resnet34 with the modified resnets which have double channels and the same number of parameters as the resnet34. It turns out that wider resnets lead to better performance.

Model	params(M)	Train Acc	Valid Acc
resnet18	11.17	98.957	96.015
resnet34	21.28	98.454	95.659
wider resnet14	21.06	99.365	96.539

Activation Function and Layer Normalization

We trained the resnet with 4 different activation functions and 4 different layer normalization methods. It appears that the combination of relu activation and batch normlizaion results in the best accuracy on CIFAR10.

Model	Activation	Normalize layer	Train Acc	Valid Acc
resnet18	Relu	Batch Norm	99.090	96.193
resnet18	Leaky Relu	Batch Norm	99.992	95.372
resnet18	Celu	Batch Norm	98.143	91.475
resnet18	Gelu	Batch Norm	99.996	94.709
-	-	-	-	-
resnet18	Relu	Instance Norm	99.996	95.085
resnet18	Relu	Layer Norm	99.994	95.115
resnet18	Relu	Group Norm	99.996	95.045

Augmentations

Random Erasing: Erasing a rectangle area randomly from inputs, filling with the constant. As shown in the table below, random erasing reduces training accuracy while increasing validation accuracy, demonstrating that the method prevents overfitting.
Mixup: In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. As shown below, mixup improves the generalization of neural network architectures.

Model	Aug	params(M)	Train Acc	Valid Acc
resnet18	-	11.17	99.090	96.193
resnet18	Erasing	11.17	98.957	96.015
convmixer256d16	-	1.28	99.954	92.692
convmixer256d16	Erasing	1.28	98.649	93.701
vgg16	-	14.72	99.986	94.165
vgg16	Erasing	14.72	98.661	95.214
-	-	-	-	-
resnet18	Mixup 0.2	11.17	91.378	95.926
resnet18	Mixup 0.3	11.17	85.472	95.767
resnet18	Mixup 0.4	11.17	83.565	96.005

hongyaohongyao/train_cifar

Train CIFAR10

Usage

Installation

Running Examples

Experiments

Network Architectures

Deeper or Wider?

Activation Function and Layer Normalization

Augmentations