Inconsistencies in schedules API
fabianp opened this issue · 4 comments
fabianp commented
- For most schedules, the end value is determined with parameter
end_value
, but for cosine_decay it's calledalpha
. : #870 - For most schedules, the total number of steps is specified through the
transition_steps
parameter, but in some cases (e.g.,optax.cosine_decay_schedule
,optax.warmup_cosine_decay_schedule
but confusingly notoptax.cosine_onecycle_schedule
) it's calleddecay_steps
instead. - The name
sgdr_schedule
is not descriptive of what the schedule actually does. - Most warm-up learning rates like
linear_onecycle_schedule
andcosine_onecycle_schedule
specify the length of the warm-up phrase using parameterpct_start
, butwarmup_cosine_decay_schedule
instead specifies it through a parameterwarmup_steps
In the documentation:
5. In the API reference https://optax.readthedocs.io/en/latest/api/optimizer_schedules.html there's a section "Schedules with warm-up". I would consider optax.cosine_onecycle_schedule
to have warm-up, yet it's not in this section. My recommendation would be to remove the section ""Schedules with warm-up" and put optax.warmup_cosine_decay_schedule
in the Cosine decay schedule section and optax.warmup_exponential_decay_schedule in the exponential decay section