bnb.optim.AdamW
patrickvonplaten opened this issue ยท 2 comments
Hey @TimDettmers,
Awesome library! bnb.optim.Adam
saved me from having to use model parallelism ๐
Do you think it would be easy to also add a bnb.optim.AdamW
version for https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html#torch.optim.AdamW
?
Happy to give it a try if you think it's easily feasible :-)
Currently, AdamW is used automatically when you use Adam with weight decay. Since this is unclear, I will include a concrete AdamW alias in the next release (copy of Adam class).
This has been added in the newest release. It was also important to get the default hyperparameter for AdamW correct to have the right default behavior. As such, this was an important correction! Thank you, @patrickvonplaten for making that suggestion!