microsoft/Samba

Typo on paper?

devzzzero opened this issue · 2 comments

Hi, I was just wondering if there is a typo in the Arxiv preprint https://arxiv.org/pdf/2406.07522 ?

The MMLU results for Mamba-2.8 seems a bit low (26, vs 45.28 for the Mamba-1.8) model?

Is it a typo or is there some other reason why the mamba-1.8b model is outperforming its larger cousin in MMLU?
(Or am I misreading something?)

Please advise.
Thank you.!

Hi, Mamba-1.8b model is trained on the high-quality phi2 dataset, while Mamba-2.8 is trained on the Pile dataset, so these two models are not directly comparable.

Thank you!