facebookresearch/bitsandbytes

bnb.optim.AdamW

patrickvonplaten opened this issue ยท 2 comments

Hey @TimDettmers,

Awesome library! bnb.optim.Adam saved me from having to use model parallelism ๐Ÿ˜

Do you think it would be easy to also add a bnb.optim.AdamW version for https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html#torch.optim.AdamW ?

Happy to give it a try if you think it's easily feasible :-)

Currently, AdamW is used automatically when you use Adam with weight decay. Since this is unclear, I will include a concrete AdamW alias in the next release (copy of Adam class).

This has been added in the newest release. It was also important to get the default hyperparameter for AdamW correct to have the right default behavior. As such, this was an important correction! Thank you, @patrickvonplaten for making that suggestion!