Reconcile S4 Optimizer w/ Original Implementation
siddk opened this issue · 0 comments
siddk commented
The original S4 repository indicates that optimization for S4 needs to be handled specially (https://github.com/HazyResearch/state-spaces/blob/feeab742e9c737c8e2b8b0e44d3efff4049f5847/example.py#L235).
Specifically:
- Fixed small learning rates for state space matrices, with no weight decay (we do not respect this with current AdamW).
- Larger learning rates & weight decay for other parameters.