hsiehjackson/RULER

A mistral long context - MegaBeam-Mistral-512K

chenwuperth opened this issue · 2 comments

Hi, thanks for the project! could you please evaluate https://huggingface.co/aws-prototyping/MegaBeam-Mistral-7B-512k on the latest RULER benchmark. Thanks!

Sure! I put the results on the leaderboard (under our evaluation) although I saw you have tested on your own. This is a pretty good long-context model. It would be great if we can have numbers to show its short context performance (MMLU, MTBench, or something on open llm leaderboard).

Thank you for testing it! Yes, I just wanted to confirm if our eval is consistent with yours (which appears to be the case). I will take a look at the short context benchmark although we have focused solely on the "long" context when training this model.