jeinlee1991/chinese-llm-benchmark
中文大模型能力评测榜单:目前已囊括128个大模型,覆盖chatgpt、gpt-4o、谷歌gemini、百度文心一言、阿里通义千问、百川、讯飞星火、商汤senseChat、minimax等商用模型, 以及qwen2.5、llama3.1、glm4、书生internLM2.5、openbuddy、AquilaChat等开源大模型。不仅提供能力评分排行榜,也提供所有模型的原始输出结果!
Issues
- 1
deepseek-chat-v2.5 可以加入评测吗
#53 opened by liyuefeng - 1
请问下打分是如何实现的?人工打分还是代码自动打分?
#52 opened by dengfan2018 - 2
评测数据太少了吧,这能说明问题?
#14 opened by yyl424525 - 0
希望评测100B以上开源模型:
#51 opened by joey-zmw - 2
纯粹搞笑的评测, 收了百度多少钱?
#43 opened by a5185330 - 5
评测数据无法吐槽
#35 opened by freedomRen - 0
What is the evaluation criteria for the score?
#19 opened by zhimin-z - 1
我们能不能增加一些基于智能体的评测
#49 opened by CoderYiFei - 1
大模型原始输出结果
#50 opened by KyleWang-Hunter - 1
请教下主观题是如何评测的?
#46 opened by runwean - 1
deepseek-chat-v2不是开源的吗
#44 opened by liyuefeng - 1
可否测评下最新开源的qwen2.5?
#45 opened by ConleyKong - 0
- 0
Claude sonnet 3.5呢
#48 opened by Skywalker144 - 4
希望能够增加RWKV模型进行评测
#6 opened by OopsYouDiedE - 0
可否评测一下stepfun的系列模型
#42 opened by forrestlinfeng - 1
评测一下 deepseek v2
#36 opened by cubxxw - 1
新增Yi-1.5系列模型的数据
#38 opened by zzc0208 - 3
可以增加llama3.1评测数据吗
#41 opened by Anionex - 0
能不能对各能力做一个详细的解释啊?
#40 opened by Wooden-Gear - 0
开个 Nemotron-4 340B 评价
#39 opened by wrench1997 - 2
10B以下的LLM排名不太准确,实际使用ChatGLM3-6B和Qwen1.5-7B表现更好
#37 opened by danny-zhu - 0
update new model
#22 opened by zzc0208 - 2
缺少重要的claude系列,申请加入相关测评
#33 opened by chiguabaobao - 0
10b以下开源排名榜单不靠谱
#34 opened by wyfSunflower - 1
建议增加1B模型测试
#25 opened by yuys0602 - 1
能否加入Function Call(工具调用)能力指标评测
#31 opened by Dream-s-Wang - 1
eval中是所有评测数据吗
#3 opened by TTCoding - 0
通义千问的评测时间?
#8 opened by liudayiheng - 1
很棒的测评,请问项目主测试数据可以转载吗
#9 opened by l269438 - 0
可以评测一下千问-7B模型吗
#10 opened by liudayiheng - 2
强烈建议加入moonshot的Kimi chat!!!
#16 opened by witherlll - 1
文心一言的新版本复测
#20 opened by huanghuanhuahuh - 1
可以测试一下openbuddy-deepseek-67b-v15.2
#21 opened by openmynet - 4
为什么千问1.5-14B-chat分这么高,比72b还高?
#28 opened by yu-zheng-tao - 1
讯飞星火推出3.5版本
#24 opened by zhisuyan - 1
可否将kimi chat加入榜单
#26 opened by LengmoAngel - 2
能否加入qianwen1.5-32B的评测
#32 opened by yu-zheng-tao - 0
讯飞星火13B开源模型测评
#30 opened by STHSF - 0
可否增加claude3商用模型的评测
#29 opened by yu-zheng-tao - 0
为什么千问1.5-14B-chat分这么高,比72b还高?
#27 opened by yu-zheng-tao - 0
- 0
This link does not redirect...
#18 opened by zhimin-z - 0
我Claude呢?
#15 opened by JiangKaslana - 0
How should I cite this work?
#13 opened by g-h-chen - 0
如果有各个模型的部署硬件要求对比就好了
#12 opened by zhangmianhongni - 0
可以评测一下Chinese-LLaMA-Alpaca-2吗
#11 opened by dodogreen - 0
很好的工作,不知道未来有将Anima-30B模型列入评测计划么?
#7 opened by UI233 - 1
如何提交自己的模型进行评测?
#4 opened by Taoooo9 - 0