Typo on paper?

Question

Typo on paper?

devzzzero opened this issue 3 months ago · 2 comments

Hi, I was just wondering if there is a typo in the Arxiv preprint https://arxiv.org/pdf/2406.07522 ?

The MMLU results for Mamba-2.8 seems a bit low (26, vs 45.28 for the Mamba-1.8) model?

Is it a typo or is there some other reason why the mamba-1.8b model is outperforming its larger cousin in MMLU?
(Or am I misreading something?)

Please advise.
Thank you.!

Answer 1 · 2024-06-18T21:58:03.000Z

Hi, Mamba-1.8b model is trained on the high-quality phi2 dataset, while Mamba-2.8 is trained on the Pile dataset, so these two models are not directly comparable.

Answer 2 · 2024-06-19T13:51:28.000Z

Thank you!