OneCycle annealing

Question

OneCycle annealing

Closed this issue 4 months ago · 5 comments

Describe the potential feature

One cycle annealing (original paper here) is a really strong scheduler and what I've found to be the most optimal scheduler for deep learning. An implementation in PyTorch is here: https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.OneCycleLR.html

Motivation

No response

Possible Implementation

"Cosine anneal from `start` to `end` as pct goes from 0.0 to 1.0."
function annealing_cos(start, stop, pct)
    cos_out = cos(pi * pct) + 1
    return stop + (start - stop) / 2.0 * cos_out
end

You would typically start at pct 0.3, and go to pct=1.0 at the end of training.

Answer 1 · 2022-11-09T16:01:21.000Z

(You could probably just build off of CosAnneal, with period=nsteps, and have it start 30% way through the cycle)

Answer 2 · 2022-11-09T20:06:57.000Z

There is an implementation here. We can adapt that into a convenience constructor in ParameterSchedulers.jl.

Answer 3 · 2022-11-09T20:45:15.000Z

Cool! That sounds perfect.

Answer 4 · 2023-03-02T00:35:05.000Z

Just pinging this, let me know if this is possible

Answer 5 · 2023-03-04T15:33:28.000Z

Yes, definitely do-able. Just need to do it. A PR doing it before I do would be appreciated and accepted.