OneCycle annealing
Closed this issue · 5 comments
Describe the potential feature
One cycle annealing (original paper here) is a really strong scheduler and what I've found to be the most optimal scheduler for deep learning. An implementation in PyTorch is here: https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.OneCycleLR.html
Motivation
No response
Possible Implementation
"Cosine anneal from `start` to `end` as pct goes from 0.0 to 1.0."
function annealing_cos(start, stop, pct)
cos_out = cos(pi * pct) + 1
return stop + (start - stop) / 2.0 * cos_out
end
You would typically start at pct 0.3, and go to pct=1.0 at the end of training.
(You could probably just build off of CosAnneal
, with period=nsteps
, and have it start 30% way through the cycle)
There is an implementation here. We can adapt that into a convenience constructor in ParameterSchedulers.jl.
Cool! That sounds perfect.
Just pinging this, let me know if this is possible
Yes, definitely do-able. Just need to do it. A PR doing it before I do would be appreciated and accepted.