Mamba Explained

Code, diagrams, and explainers for the Mamba architecture.

Why should you care about Mamba?

Like RNNs, Mamba's memory and computational complexity scale linearly with sequence length.
Like Transformers, Mamba can be efficiently parallelized during training (with careful engineering)
Because Mamba is RNN-like, it generalizes well to sequence lengths larger than those seen during training
On language modeling tasks, Mamba scales as well or better than transformers up to 3B params
Mamba has been successfully applied to many modalities (text, audio, image, video) with minimal modification

Coming soon

Coming soon

Coming soon