/mamba-explained

Code, diagrams, and explainers for the mamba architecture

Primary LanguagePythonMozilla Public License 2.0MPL-2.0

Mamba Explained

Code, diagrams, and explainers for the Mamba architecture.

Why should you care about Mamba?

  • Like RNNs, Mamba's memory and computational complexity scale linearly with sequence length.
  • Like Transformers, Mamba can be efficiently parallelized during training (with careful engineering)
  • Because Mamba is RNN-like, it generalizes well to sequence lengths larger than those seen during training
  • On language modeling tasks, Mamba scales as well or better than transformers up to 3B params
  • Mamba has been successfully applied to many modalities (text, audio, image, video) with minimal modification

Overview:

Coming soon

Diagrams:

Coming soon

Code:

Coming soon

Links:

Blog Posts:

Videos:

Implementations: