bnb.optim.AdamW

Question

bnb.optim.AdamW

patrickvonplaten opened this issue 3 years ago · 2 comments

Awesome library! bnb.optim.Adam saved me from having to use model parallelism 😍

Do you think it would be easy to also add a bnb.optim.AdamW version for https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html#torch.optim.AdamW ?

Happy to give it a try if you think it's easily feasible :-)

Answer 1 · 2021-11-15T15:50:29.000Z

Currently, AdamW is used automatically when you use Adam with weight decay. Since this is unclear, I will include a concrete AdamW alias in the next release (copy of Adam class).

Answer 2 · 2021-12-04T20:07:31.000Z

This has been added in the newest release. It was also important to get the default hyperparameter for AdamW correct to have the right default behavior. As such, this was an important correction! Thank you, @patrickvonplaten for making that suggestion!