SGD in Max-Norm
melgor opened this issue · 3 comments
Hi,
I was analyzing you results in this notebook and I found some bugs/incorrect approach:
- When training SGD you start from LR:0.001. I stated with LR: 0.01 and final accuracy was ~82% (so 6% better)
- Then you are trying to plot
validation acc and loss
, in fact, you plottraining
data. This is whymomentum
method is so low.
Check my runs, I think that I trained models correctly and displayed charts also.
https://gist.github.com/melgor/e106ff0e712534d267a2a1851b6fc299
Also, I've made some other experiments regarding Normalization of gradient in Resnet18 in CIFAR-10. And currently I'm not able to match results of SGD + Momentum (I use my own implementation in PyTorch, I did use L2, L1 and Max normalization, STD normalization still on my list)
Hi, thank you for feedback, of course I should use val_loss
that a huge typo from my side. I will rerun the notebooks with a corrected setup in free time. I will change also the initial learning rate for SGD + momentum to make benchmarks more fair.
Regarding your last comment, I had to use std for CIFAR to get the best score. However, I have tried gradient normalization on detection problem with a success.
To conclude: I think it is worth to give it a try, it is easy to implement and may provide benefits is some setups. I hope that other notebooks were fine and reproducible (Of course I will probably have to change the metrics to val_acc
). Thanks again.
- fix notebooks and use better learning rate for cifar-sgd+momentum
I have update the results in the notebooks.