CyberZHG/keras-radam

Do I need to tune learning rates?

xuzhang5788 opened this issue · 4 comments

Thank you so much for your great implementation.
Do I need to add a callback like ReduceLROnPlateau? Can I combine RAdam and AdamW(Adam with weight decay) together? How about using RAdam with one-cycle-policy?

@xuzhang5788 About the callback, I believe so. Check the ~/tests/test_optimizers.py file: ReduceLROnPlateau is being called in model.fit().

@pedromlsreis Thank you for your reply.
In the paper, it said that RAdam can dynamic adjustment to the adaptive learning rate. Why should we schedule the learning rate decay?

@xuzhang5788 oh okay. I can't answer you as now I've got the same doubt :)

See the guide in the official repo:

Directly replace the vanilla Adam with RAdam without changing any settings

It's just a feature that should be implemented.