huggingface/pytorch-image-models

[BUG] Scheduler implicitly do not change lr

fzyzcjy opened this issue · 1 comments

Hi thanks for the library! https://github.com/huggingface/pytorch-image-models/blob/main/timm/scheduler/scheduler.py says:

Unlike the builtin PyTorch schedulers, this is intended to be consistently called

  • At the END of each epoch, before incrementing the epoch count, to calculate next epoch's value
  • At the END of each optimizer update, after incrementing the update count, to calculate next update's value

However, from my naive understanding (please correct me if I am wrong!), this has some problem: Suppose our training loop is

for batch in dataloader:
  ...
  optimizer.step()
  scheduler.step_update()

And suppose our scheduler do warmup or something like that. Then, for the first batch, we do expect it to have tiny lr (because of warmup), but the reality is that it gives original lr, because our scheduler does not even have a chance to update anything! Therefore this looks unintentional. The example above is step-based, and similar thing may also happen for epoch-based schedulers.

Therefore, I propose to change the semantics to as follows:

  • At the BEGINNING of each epoch, BEFORE calling optimizer step or incrementing the epoch count, to calculate CURRENT epoch's value
  • At the BEGINNING of each optimizer update, BEFORE calling optimizer step or incrementing the update count, to calculate CURRENT update's value

I have checked the implementations, and it seems that they do not have special requirements about this.

I see the issue: The subclasses, in constructor, initializes the optimizer, via update_groups! (which I did not realize when implementing my own scheduler...)