google-deepmind/optax

Math pseudocode in description of SGD with Nesterov is incorrect

Closed this issue · 5 comments

The math pseudocode in the description of SGD with Nesterov is currently given as:
Screenshot 2024-04-03 at 4 12 26 PM

I believe this is incorrect. Apart from the circular definition of m_t in the case when nesterov = False, the definition of m_t itself should be corrected. The correct set of equations should be:
Screenshot 2024-04-03 at 5 07 24 PM

Or alternatively,
Screenshot 2024-04-03 at 5 05 34 PM

This can be verified from the equations (3) and (4) in Sutskever et al, On the importance of initialization and momentum in deep learning, 2013, with the change of variables m_t = -v_t/epsilon and alpha_t = epsilon.

Ouh right, thank you very much for catching that @satyenkale! Could you make the correction with a quick pr?

Thanks! I created a PR (#901) but I am not sure if all the checks went through.

Thank you again! It should go through. The bugs in #901 seem to be related to some changes in jax that broke some of our code. We'll investigate that.

Solved in #901. Thanks again!