SGD in Max-Norm

Question

SGD in Max-Norm

melgor opened this issue 6 years ago · 3 comments

Hi,
I was analyzing you results in this notebook and I found some bugs/incorrect approach:

When training SGD you start from LR:0.001. I stated with LR: 0.01 and final accuracy was ~82% (so 6% better)
Then you are trying to plot validation acc and loss, in fact, you plot training data. This is why momentum method is so low.

Check my runs, I think that I trained models correctly and displayed charts also.
https://gist.github.com/melgor/e106ff0e712534d267a2a1851b6fc299

Also, I've made some other experiments regarding Normalization of gradient in Resnet18 in CIFAR-10. And currently I'm not able to match results of SGD + Momentum (I use my own implementation in PyTorch, I did use L2, L1 and Max normalization, STD normalization still on my list)

Answer 1 · 2018-06-23T09:08:29.000Z

Hi, thank you for feedback, of course I should use val_loss that a huge typo from my side. I will rerun the notebooks with a corrected setup in free time. I will change also the initial learning rate for SGD + momentum to make benchmarks more fair.
Regarding your last comment, I had to use std for CIFAR to get the best score. However, I have tried gradient normalization on detection problem with a success.
To conclude: I think it is worth to give it a try, it is easy to implement and may provide benefits is some setups. I hope that other notebooks were fine and reproducible (Of course I will probably have to change the metrics to val_acc). Thanks again.

Answer 2 · 2018-06-23T09:09:01.000Z

fix notebooks and use better learning rate for cifar-sgd+momentum

Answer 3 · 2018-07-21T23:50:09.000Z

I have update the results in the notebooks.