hsiehjackson/RULER

gpt-4o results?

the21st opened this issue · 2 comments

Would love to see results for gpt-4o. There was some claimed improvement in its abilities: http://nian.llmonpy.ai/

We also plan to run evaluation for gpt-4o! Looks like gpt-4o has large improvement to solve lost-in-the-middle issue.

so ? when?