open-compass/T-Eval

是否支持qwen1.5,复现结果差距较大

Opened this issue · 0 comments

Qwen1.5-72B-Chat的结果:
Instruct Plan Reason Retrieve Understand Review Overall
96.86 73.45 57.98 74.41 50.41 53.18 67.71
Qwen1.5的报告:https://qwenlm.github.io/zh/blog/qwen1.5/
和这个报告中的结果差距较大