microsoft/Samba

Mamba2

Joelx opened this issue · 2 comments

First, thanks for this great contribution and the well written paper!

As Mamba2 also was released very recently, do you have any thoughts on the potential integration or impact of Mamba2 on the Samba architecture?

Would be much appreciated.

I tested this a while ago and it was basically swappable. From memory I just replaced:
from .mamba_simple import Mamba here

With
from mamba_ssm import Mamba2 as Mamba

I'm currently training a 270M Samba model, and will try again with Mamba2 to compare the results.