open-compass/T-Eval

qwen-14b评测结果疑问

Opened this issue · 0 comments

用作者提供的模板自行实现的CustomAPI类评测qwen-14b-chat模型得到如下结果:
Instruct Plan Review Reason Retrieve Understand overall
97.0 78.0 41.9 60.0 86.6 61.8 70.9
Retrieve和Understand阶段的指标与ZH Leaderboard上相差较大