hsiehjackson/RULER

hope add qwen2-7b-chat result

Chandler-Bing opened this issue · 2 comments

thanks for the great project, hope add official qwen2-7b-chat result,to compair with glm-4-96-chat.
according to qwen tech report: qwen2 is much better than glm4 on long context evaluation , but I doubt it...

According to Table 12 in Qwen2 tech report, if we just evaluate vanilla Qwen2-7b-128K and GLM4-9b-1M, then GLM4 may get better results. However, Qwen2 also proposes Yarn+DCA (training free extension) to boost their performance. I didn't find the codes to run inference with these techniques, so unfortunately I don't have results for comparisons.

thanks,