srush/annotated-s4

Reconcile S4 Optimizer w/ Original Implementation

siddk opened this issue · 0 comments

siddk commented

The original S4 repository indicates that optimization for S4 needs to be handled specially (https://github.com/HazyResearch/state-spaces/blob/feeab742e9c737c8e2b8b0e44d3efff4049f5847/example.py#L235).

Specifically:

  • Fixed small learning rates for state space matrices, with no weight decay (we do not respect this with current AdamW).
  • Larger learning rates & weight decay for other parameters.