BMIRDS/deepslide

Reduce LR on Plateau

Closed this issue · 1 comments

Any chance we can implement this? https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.ReduceLROnPlateau

Unless the current learning rate decay method works really well in your opinion...

I have also thought seen people start with learning rates of 0.1 or 0.01, although 0.1 might be too high, any thoughts? Just a suggestion

Implementation with that scheduler should be quite easy, though I haven't seen much practical advantage over others i.e. exponential decay, as they usually just works. Well, it'd always be great to have an option. You want to implement this @jasonwei20 ?

Regarding an initial learning rate, 0.1 was said to be okay for adagrad or SGD but high for adam, which usually works well with 0.001. However, I'm guessing the model is not that picky and probably works fine with higher initial learning rate especially by combining with ExponentialDecay scheduler and training enough epochs.