/Warmup_Exponential_decay_LS

Warmup-Exponential decay Learning schedular

Primary LanguagePythonMIT LicenseMIT

Warmup-Exponential decay Learning schedular

This is a learning rate scheduler that was inspired by the Transformer paper, "Attention is All You Need" by Ashish Vaswani et al., 2017. It uses a warmup step to quickly adapt to the problem through large-scale learning, and it can converge to a desired learning rate (target learning rate) through a differentiable function. The rate of convergence from the maximum learning rate to the target learning rate can be adjusted by modulating the exponential function (variable a).

Variables (Hyper-parameters)

Variable Explain
max_lr maximum learning rate (warm-up)
min_lr target(minimum) learning rate
num_warmup number of warm-up steps
a(alpha) Rate of convergence (curvature of the function): The larger the value, the faster the convergence (0 < a)

Graph

alt text max_lr: 0.01, min_lr: 0.001, num_warm: 50, a: 0.1

Function

$step_{now} \leq step_{warmup}$

$$ lr=(\max lr/\max step_{warmup})*step_{now} $$

$step_{warmup} &lt; step_{now}$

$$ lr=-e^{\alpha(step_{now}-\max step_{warmup})} + \min lr $$

Others

TF_warmup_exponential.py is for Tensorflow.

For Pytorch, I will make it. or welcome for pull request.

Contributor: Thanks to Gyeonghun Kim