/ReZero-Cifar100

Verification of ReZero ResNet on cifar 100 dataset

Primary LanguagePythonMIT LicenseMIT

ReZero-Cifar100

Unofficial verification of ReZero ResNet on cifar 100 dataset

ReZero

ReZero is All You Need: Fast Convergence at Large Depth
https://arxiv.org/abs/2003.04887

Unofficial pytorch implementation of ReZero in ResNet
https://github.com/fabio-deep/ReZero-ResNet

Verification

Trained PreAct-ResNet with Cifar100 and verified how accuracy and convergence change with and without ReZero.

Condition

  • Data
    • cifar 100
  • Model
    • Base model: PreAct ResNet 18, 50
      [https://arxiv.org/abs/1603.05027:title]
    • Model with ReZero: All the residual connections in the base model are changed to ReZero connections. The initial value of residual weight α is 0.
    • Model with ReZero (personally improved version) : All the residual connections in the base model have been changed to ReZero connections. Use tanh (α) instead of α (initial value of α is 0). To prevent α from becoming abnormally large, we used tanh (α) for the purpose of limiting the value range.
  • Learning method
    • Cross entropy loss
    • SGD, learning rate 0.1 (reduce learning rate by 0.2 for 60, 120, 160 epoch), 200 epochs, batch size 128
    • Data augmentation (random flip, random shift scale rorate)

Result

The accuracy and convergence did not improve.

PreAct ResNet 18

mrc
mrc
mrc

PreAct ResNet 50

mrc
mrc
mrc