Test results for the June sneaky update of the Phi 3 models ?
Opened this issue · 1 comments
bhugueney commented
Thank you for this most excellent project !
In June 2024, Microsoft sneakily updated their Phi 3 model which greatly improved the context use :
RULER: a retrieval-based benchmark for long context understanding
Model | 4K | 8K | 16K | 32K | 64K | 128K | Average |
---|---|---|---|---|---|---|---|
Original | 86.7 | 78.1 | 75.6 | 70.3 | 58.9 | 43.3 | 68.8 |
June 2024 Update | 92.4 | 91.1 | 90.8 | 87.9 | 79.8 | 65.6 | 84.6 |
Would you mind having this version in your table ?
Thx.
hsiehjackson commented
Thanks for the information! I re-evaluate phi3-mini and put the results on our leaderboard.