
Linear learning rate schedule is not consistent with max_steps

Opened this issue · 0 comments

When using the default linear schedule, the immediate LR (learning rate) after the warmup_steps should linearly decay until zero when reaching the first one of (args.max_steps, args.num_train_epochs for the objective with max number of samples). Currently, it seems that LR is chosen solely on args.max_steps - if these are not set, LR after warmup_steps is not decayed.

This is how learning rate now looks for a training reaching a termination condition ALL_OBJECTIVES_NUM_EPOCHS:
learning_rate (1)

Note that to keep a consistency with HuggingFace Transformers, args.max_steps should override the number of steps given by args.num_epochs (see here).

It will probably suffice to change the logic of computing the training dataset length (refer to lengths_combined in Schedule).