google-deepmind/optax

Inconsistencies in schedules API

fabianp opened this issue · 4 comments

  • For most schedules, the end value is determined with parameter end_value, but for cosine_decay it's called alpha. : #870
  • For most schedules, the total number of steps is specified through the transition_steps parameter, but in some cases (e.g., optax.cosine_decay_schedule, optax.warmup_cosine_decay_schedule but confusingly not optax.cosine_onecycle_schedule) it's called decay_steps instead.
  • The name sgdr_schedule is not descriptive of what the schedule actually does.
  • Most warm-up learning rates like linear_onecycle_schedule and cosine_onecycle_schedule specify the length of the warm-up phrase using parameter pct_start , but warmup_cosine_decay_schedule instead specifies it through a parameter warmup_steps

In the documentation:
5. In the API reference https://optax.readthedocs.io/en/latest/api/optimizer_schedules.html there's a section "Schedules with warm-up". I would consider optax.cosine_onecycle_schedule to have warm-up, yet it's not in this section. My recommendation would be to remove the section ""Schedules with warm-up" and put optax.warmup_cosine_decay_schedule in the Cosine decay schedule section and optax.warmup_exponential_decay_schedule in the exponential decay section