A mistral long context - MegaBeam-Mistral-512K
chenwuperth opened this issue · 2 comments
chenwuperth commented
Hi, thanks for the project! could you please evaluate https://huggingface.co/aws-prototyping/MegaBeam-Mistral-7B-512k on the latest RULER benchmark. Thanks!
hsiehjackson commented
Sure! I put the results on the leaderboard (under our evaluation) although I saw you have tested on your own. This is a pretty good long-context model. It would be great if we can have numbers to show its short context performance (MMLU, MTBench, or something on open llm leaderboard).
chenwuperth commented
Thank you for testing it! Yes, I just wanted to confirm if our eval is consistent with yours (which appears to be the case). I will take a look at the short context benchmark although we have focused solely on the "long" context when training this model.