ReZero-Cifar100

Unofficial verification of ReZero ResNet on cifar 100 dataset

ReZero

ReZero is All You Need: Fast Convergence at Large Depth
https://arxiv.org/abs/2003.04887

Unofficial pytorch implementation of ReZero in ResNet
https://github.com/fabio-deep/ReZero-ResNet

Trained PreAct-ResNet with Cifar100 and verified how accuracy and convergence change with and without ReZero.

Data
- cifar 100
Model
- Base model: PreAct ResNet 18, 50
  [https://arxiv.org/abs/1603.05027:title]
- Model with ReZero: All the residual connections in the base model are changed to ReZero connections. The initial value of residual weight α is 0.
- Model with ReZero (personally improved version) : All the residual connections in the base model have been changed to ReZero connections. Use tanh (α) instead of α (initial value of α is 0). To prevent α from becoming abnormally large, we used tanh (α) for the purpose of limiting the value range.
Learning method
- Cross entropy loss
- SGD, learning rate 0.1 (reduce learning rate by 0.2 for 60, 120, 160 epoch), 200 epochs, batch size 128
- Data augmentation (random flip, random shift scale rorate)

The accuracy and convergence did not improve.