Unofficial verification of ReZero ResNet on cifar 100 dataset
ReZero is All You Need: Fast Convergence at Large Depth
https://arxiv.org/abs/2003.04887
Unofficial pytorch implementation of ReZero in ResNet
https://github.com/fabio-deep/ReZero-ResNet
Trained PreAct-ResNet with Cifar100 and verified how accuracy and convergence change with and without ReZero.
- Data
- cifar 100
- cifar 100
- Model
- Base model: PreAct ResNet 18, 50
[https://arxiv.org/abs/1603.05027:title] - Model with ReZero: All the residual connections in the base model are changed to ReZero connections. The initial value of residual weight α is 0.
- Model with ReZero (personally improved version) : All the residual connections in the base model have been changed to ReZero connections. Use tanh (α) instead of α (initial value of α is 0). To prevent α from becoming abnormally large, we used tanh (α) for the purpose of limiting the value range.
- Base model: PreAct ResNet 18, 50
- Learning method
- Cross entropy loss
- SGD, learning rate 0.1 (reduce learning rate by 0.2 for 60, 120, 160 epoch), 200 epochs, batch size 128
- Data augmentation (random flip, random shift scale rorate)
- Cross entropy loss
The accuracy and convergence did not improve.