Test results for the June sneaky update of the Phi 3 models ?

Question

Opened this issue a month ago · 1 comments

Thank you for this most excellent project !
In June 2024, Microsoft sneakily updated their Phi 3 model which greatly improved the context use :

RULER: a retrieval-based benchmark for long context understanding

Model	4K	8K	16K	32K	64K	128K	Average
Original	86.7	78.1	75.6	70.3	58.9	43.3	68.8
June 2024 Update	92.4	91.1	90.8	87.9	79.8	65.6	84.6

Would you mind having this version in your table ?

Thx.

Answer 1 · 2024-08-08T17:29:57.000Z

Thanks for the information! I re-evaluate phi3-mini and put the results on our leaderboard.