Linear learning rate schedule is not consistent with max_steps
Opened this issue · 0 comments
When using the default linear schedule, the immediate LR (learning rate) after the warmup_steps
should linearly decay until zero when reaching the first one of (args.max_steps
, args.num_train_epochs
for the objective with max number of samples). Currently, it seems that LR is chosen solely on args.max_steps
- if these are not set, LR after warmup_steps is not decayed.
This is how learning rate now looks for a training reaching a termination condition ALL_OBJECTIVES_NUM_EPOCHS
:
Note that to keep a consistency with HuggingFace Transformers, args.max_steps
should override the number of steps given by args.num_epochs
(see here).
It will probably suffice to change the logic of computing the training dataset length (refer to lengths_combined in Schedule).