OpenLMLab/GAOKAO-Bench

GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.

PythonApache-2.0

Issues

开源人工评分数据集
#40 opened a month ago by Nefefilibata
0
2024高考评测集
#39 opened a month ago by cobraheleah
0
分数不一致
#38 opened 2 months ago by kekezhai
1
论文中温度是0.3，可以分享一下其他值的设置吗，比如top_p，top_k
#37 opened 3 months ago by jidandan666
1
A question about the paper
#36 opened 4 months ago by endNone
4
是否支持其他开源模型本地调用，不是以apikey方式
#35 opened 4 months ago by jidandan666
4
The evaluation result can't match the paper result
#33 opened 8 months ago by CoderBak
3
Missing Question
#30 opened 9 months ago by MarigoldTechStriker
1
Where are the evaluation results for those two columns?
#25 opened 9 months ago by zhimin-z
1
Solution of "Exception: list index out of range" when run `python choice_bench.py`
#18 opened 9 months ago by ALLinLLM
1
json. decoder.JSONDecodeError: Invalid escape:
#19 opened 9 months ago by zyy-2001
3
Typos in the dataset.
#20 opened 9 months ago by chengeharrison
1
关于测试题目格式
#21 opened 9 months ago by jimmyzhang610
1
gpt-3.5-turbo 高考得分在哪？
#7 opened 9 months ago by bansky-cl
1
有GPT-4的具体实验结果吗
#28 opened 9 months ago by K1yomi
2
请问1000道主观题都是完全的人工评分吗？涉及到用gpt4来评分吗？
#23 opened 9 months ago by eyuansu62
4
按照README文档运行Openai Api简单实例，运行choice_bench.py后输出2010-2022_Math_II_MCQs single_choice mkdir: cannot create directory ‘../data/Multiple-choice_Questions/gpt-3.5-turbo_2010-2022_Math_II_MCQs’: File exists 0%| | 0/44 [00:00<?, ?it/s]Exception: Cannot choose from an empty sequence 0%| | 0/44 [00:00<?, ?it/s]Exception: Cannot choose from an empty sequence 0%| | 0/42 [00:00<?, ?it/s]Exception: Cannot choose from an empty sequence 0%| | 0/44 [00:00<?, ?it/s]Exception: Cannot choose from an empty sequence 0%| | 0/44 [00:00<?, ?it/s]Exception: Cannot choose from an empty sequence Exception: Cannot choose from an empty sequence
#27 opened a year ago by FreshOrangess
0
请问`Gaokao-2023`这个数据集去哪里找呢？
#26 opened a year ago by zhimin-z
1
请问关于评估过程是0shot评测还是5shot评测？
#24 opened a year ago by liu904-61
2
这个评测怎么没有排除主观评测
#22 opened a year ago by eyuansu62
0
有没有gpt4的得分？
#4 opened a year ago by vsEcho567
2
数学I小错误
#10 opened a year ago by liyongsea
2
数据收集过程
#13 opened a year ago by Richar-Du
1
数学题格式
#11 opened a year ago by liyongsea
2
猪八诶
#5 opened 2 years ago by sfsdfd62
0
Bench 目录下三个 prompt.json 文件中有 <eoa> 错写成 <eoe>
#6 opened 2 years ago by Leymore
1