different from the original implementation

Question

different from the original implementation

Jayis opened this issue 5 years ago · 4 comments

Hi,

there seems to be 2 differences from the original implementation in your implementation
https://github.com/hysts/pytorch_shake_shake/blob/master/shake_shake.py

an extra ReLu at line 177
an missing ReLu between line 180, 181

could you please check that for me? thanks a lot

ref:
line 138-142 at
https://github.com/xgastaldi/shake-shake/blob/master/models/shakeshake.lua
line 135-142 at
https://github.com/tensorflow/models/blob/master/research/autoaugment/shake_shake.py

Answer 1 · 2020-04-18T15:30:01.000Z

Hi, @Jayis

Thank you for letting me know that. You're right. I also found that the number of channels of the first conv is also different from the original implementation.
I'm going to fix both of them, but it will take some time to see if the code works fine.

Answer 2 · 2020-04-21T10:38:12.000Z

Hi, @Jayis

I fixed them and ran the training again.
The test errors are

shake-shake-26 2x32d: 3.29
shake-shake-26 2x64d: 3.11
shake-shake-26 2x96d: 2.97

(Run only once each)

It seems slightly worse than the one reported for 2x64d and 2x96d, but according to the logs, the best error in training are no worse than those, so I think it's just random fluctuations.

Thanks, again! :)

Answer 3 · 2020-04-21T18:33:53.000Z

Hi, @hysts

Thanks for your fast-reply.
I've also got some experiment results to share with you.

I trained a 2x96d on cifar10 for 4 times after fixing the first-conv problem you mentioned.
(I've checked your commit. My modification was roughly the same as yours )
I used https://github.com/facebookresearch/pycls as training framework.
Using following hyper-parameters
batch size: 128 learning rate: 0.2 weight decay: 1e-4 cos learning rate decay 1800 epoch
And I got test errors (last epoch)
3.00%, 3.01%, 2.87%, 2.94%

Hope this helps a little.
Finally,
again, thanks.

Answer 4 · 2020-04-21T20:53:01.000Z

Hi, @Jayis

Wow, it's really helpful. Thanks a lot!