Learning Rate Sudden Dropped to min_lr after Warm-up Steps
Opened this issue · 1 comments
Describe the bug
Hi, I am observing that the learning rate suddenly dropped to model.optim.sched.min_lr
after model.optim.sched.warmup_steps
. I am using CosineAnnealing
, where I am expecting the learning rate will gradually drop to min_lr after warmup steps instead of suddenly dropping.
Steps/Code to reproduce bug
optim:
name: distributed_fused_adam
lr: 5e-6
weight_decay: 0.01
betas:
- 0.9
- 0.98
sched:
name: CosineAnnealing
warmup_steps: 250
constant_steps: 2500
min_lr: 1e-7
Expected behavior
I am expecting the learning rate will gradually drop to min_lr after warmup steps instead of suddenly dropping. If I am doing it the wrong way, what should be the correct way of making this possible?
Environment overview (please complete the following information)
- PyTorch version 2.3
- Python version 3.10
Looking herehttps://github.com/NVIDIA/NeMo/blob/main/nemo/core/optim/lr_scheduler.py#L353 you may need to set decay_steps
(If I'm looking in the correct place). It looks like during the warmup_steps
the learning rate linearly ramps to max_lr
, then decays to min_lr
during decay_steps
. Curious if that works for you!