NVIDIA/NeMo

Learning Rate Sudden Dropped to min_lr after Warm-up Steps

Opened this issue · 1 comments

Describe the bug

Hi, I am observing that the learning rate suddenly dropped to model.optim.sched.min_lr after model.optim.sched.warmup_steps. I am using CosineAnnealing, where I am expecting the learning rate will gradually drop to min_lr after warmup steps instead of suddenly dropping.
Image

Steps/Code to reproduce bug

  optim:
    name: distributed_fused_adam
    lr: 5e-6
    weight_decay: 0.01 
    betas: 
    - 0.9
    - 0.98
    sched:

      name: CosineAnnealing
      warmup_steps: 250
      constant_steps: 2500

      min_lr: 1e-7

Expected behavior

I am expecting the learning rate will gradually drop to min_lr after warmup steps instead of suddenly dropping. If I am doing it the wrong way, what should be the correct way of making this possible?

Environment overview (please complete the following information)

  • PyTorch version 2.3
  • Python version 3.10

Looking herehttps://github.com/NVIDIA/NeMo/blob/main/nemo/core/optim/lr_scheduler.py#L353 you may need to set decay_steps (If I'm looking in the correct place). It looks like during the warmup_steps the learning rate linearly ramps to max_lr, then decays to min_lr during decay_steps. Curious if that works for you!