Do I need to tune learning rates?
xuzhang5788 opened this issue · 4 comments
Thank you so much for your great implementation.
Do I need to add a callback like ReduceLROnPlateau? Can I combine RAdam and AdamW(Adam with weight decay) together? How about using RAdam with one-cycle-policy?
@xuzhang5788 About the callback, I believe so. Check the ~/tests/test_optimizers.py file: ReduceLROnPlateau is being called in model.fit().
@pedromlsreis Thank you for your reply.
In the paper, it said that RAdam can dynamic adjustment to the adaptive learning rate. Why should we schedule the learning rate decay?
@xuzhang5788 oh okay. I can't answer you as now I've got the same doubt :)
See the guide in the official repo:
Directly replace the vanilla Adam with RAdam without changing any settings
It's just a feature that should be implemented.