fast_mamba.np
is a pure and efficient NumPy implementation of Mamba, featuring cache support. This code preserves the native caching capability of Mamba while maintaining a simple and clear implementation. The use of caching prevents the need to recompute previous tokens, resulting in a performance boost that can reach up to 4x for 100 tokens, compared to mamba.np.
$ python fast_mamba.py "I have a dream that"
"""
I have a dream that I will be able to see the sunrise in the morning.
Token count: 18, elapsed: 9.65s, 1.9 tokens/s
"""
If you use or discuss fast_mamba.np
in your academic research, please cite the project to help spread awareness:
@misc{fast_mamba.np,
title = {fast_mamba.np: pure NumPy implementation for Mamba},
author = {Ido Hakimi},
howpublished = {\url{https://github.com/idoh/fast_mamba.np}},
note = {fast_mamba.np, MIT License}
year = {2024},
}
Thank you to the creators of the following libraries and tools and their contributors:
- mamba-minimal - @johnma2006
- llama3.np - @likejazz
- The Mamba architecture was introduced in Mamba: Linear-Time Sequence Modeling with Selective State Spaces by Albert Gu and Tri Dao
- The official implementation is here: https://github.com/state-spaces/mamba
- Title image was generated by Microsoft Designer and edited which Canva Photo Editor