Issues
- 1
希望增加对于Qwen2的测试
#69 opened by liduang - 1
- 0
DiMind结果验证申请
#72 opened by lingbaishun - 1
希望增加对于Grok-1的测试
#63 opened by XYCode-Kerman - 1
数据集怎么回事
#62 opened by houxiang676 - 1
请问ChatGLM3有测试结果吗
#57 opened by ScienGU - 1
支持yi-34b-chat吗?
#55 opened by xxm1668 - 1
如果用评测集进行训练,是不是可以拿满分,如何防止作弊?
#48 opened by xealml - 6
外部API接口的输入/输出格式和邮箱地址
#46 opened by jru001 - 1
每个 csv 文件具体属于哪个 category
#40 opened by rattlesnakey - 2
category以及总体average得分的计算逻辑
#39 opened by XinyuGuan01 - 3
cmmlu测试集结果更新
#30 opened by leoymr - 1
SyntaxError: unmatched ')'
#33 opened by bwin90 - 3
- 1
请问一下,如果想提交模型结果,更新到榜单上,需要怎么操作?
#26 opened by chuxin1457 - 3
CMMLU测试
#27 opened by huayicong23 - 1
支持llama2吗?
#24 opened by xxm1668 - 1
- 1
请问一下,MILM的测试是如何进行的?
#19 opened by ztxz16 - 2
Support Qwen-7b
#16 opened by mMrBun - 1
刚开始学习ai,想问问文档的 Five-shot 是 few-shot 吗?
#18 opened by KarnaughK - 1
ChatGLM2-6b模型用eval精度比eval_chat低,正常吗?
#14 opened by ztxz16 - 1
提示-评估中的链接失效
#13 opened by cobraheleah - 3
- 1
Baichuan-13B-Chat
#9 opened by xianghuisun - 1
[BUG maybe in few-shot setting]计算模型选择的答案时,对于很多模型代码里实际上比较的是['_A', '_B', '_C', '_D']这四个token的概率,而非['A', 'B', 'C', 'D']的概率
#11 opened by Heepo - 1
[Feature] Support CMMLU in OpenCompass
#10 opened by tonysy - 3
- 2
get_results出来的分数有一定随机性
#8 opened by ztxz16 - 1
【数据错误】huggingface 上的数据加载有一个错误
#7 opened by LiuLinyun - 1
【baichuan-13】可否对比下百川13B的模型,近日发布的
#4 opened by LouisHeck - 1
logo扇面上没有“world history”世界历史这一主题
#3 opened by Reeleon - 1
categories.py中name_en2zh、subcategories不是字典升序的
#2 opened by Reeleon - 2
是否考虑使用四个选项的概率大小来评估模型?
#1 opened by DaoD