different from the original implementation
Jayis opened this issue · 4 comments
Hi,
there seems to be 2 differences from the original implementation in your implementation
https://github.com/hysts/pytorch_shake_shake/blob/master/shake_shake.py
- an extra ReLu at line 177
- an missing ReLu between line 180, 181
could you please check that for me? thanks a lot
ref:
line 138-142 at
https://github.com/xgastaldi/shake-shake/blob/master/models/shakeshake.lua
line 135-142 at
https://github.com/tensorflow/models/blob/master/research/autoaugment/shake_shake.py
Hi, @Jayis
Thank you for letting me know that. You're right. I also found that the number of channels of the first conv is also different from the original implementation.
I'm going to fix both of them, but it will take some time to see if the code works fine.
Hi, @Jayis
I fixed them and ran the training again.
The test errors are
shake-shake-26 2x32d: 3.29
shake-shake-26 2x64d: 3.11
shake-shake-26 2x96d: 2.97
(Run only once each)
It seems slightly worse than the one reported for 2x64d and 2x96d, but according to the logs, the best error in training are no worse than those, so I think it's just random fluctuations.
Thanks, again! :)
Hi, @hysts
Thanks for your fast-reply.
I've also got some experiment results to share with you.
I trained a 2x96d on cifar10 for 4 times after fixing the first-conv problem you mentioned.
(I've checked your commit. My modification was roughly the same as yours )
I used https://github.com/facebookresearch/pycls as training framework.
Using following hyper-parameters
batch size: 128 learning rate: 0.2 weight decay: 1e-4 cos learning rate decay 1800 epoch
And I got test errors (last epoch)
3.00%, 3.01%, 2.87%, 2.94%
Hope this helps a little.
Finally,
again, thanks.