hsiehjackson/RULER

Test results for the June sneaky update of the Phi 3 models ?

Opened this issue · 1 comments

Thank you for this most excellent project !
In June 2024, Microsoft sneakily updated their Phi 3 model which greatly improved the context use :

RULER: a retrieval-based benchmark for long context understanding

Model 4K 8K 16K 32K 64K 128K Average
Original 86.7 78.1 75.6 70.3 58.9 43.3 68.8
June 2024 Update 92.4 91.1 90.8 87.9 79.8 65.6 84.6

Would you mind having this version in your table ?

Thx.

Thanks for the information! I re-evaluate phi3-mini and put the results on our leaderboard.