FluxML/ParameterSchedulers.jl

OneCycle annealing

Closed this issue · 5 comments

Describe the potential feature

One cycle annealing (original paper here) is a really strong scheduler and what I've found to be the most optimal scheduler for deep learning. An implementation in PyTorch is here: https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.OneCycleLR.html

Motivation

No response

Possible Implementation

"Cosine anneal from `start` to `end` as pct goes from 0.0 to 1.0."
function annealing_cos(start, stop, pct)
    cos_out = cos(pi * pct) + 1
    return stop + (start - stop) / 2.0 * cos_out
end

You would typically start at pct 0.3, and go to pct=1.0 at the end of training.

(You could probably just build off of CosAnneal, with period=nsteps, and have it start 30% way through the cycle)

There is an implementation here. We can adapt that into a convenience constructor in ParameterSchedulers.jl.

Cool! That sounds perfect.

Just pinging this, let me know if this is possible

Yes, definitely do-able. Just need to do it. A PR doing it before I do would be appreciated and accepted.