Code, diagrams, and explainers for the Mamba architecture.
- Like RNNs, Mamba's memory and computational complexity scale linearly with sequence length.
- Like Transformers, Mamba can be efficiently parallelized during training (with careful engineering)
- Because Mamba is RNN-like, it generalizes well to sequence lengths larger than those seen during training
- On language modeling tasks, Mamba scales as well or better than transformers up to 3B params
- Mamba has been successfully applied to many modalities (text, audio, image, video) with minimal modification
Coming soon
Coming soon
Coming soon