Issues
- 6
ooba not parsing in .cfg
#37 opened by zisismp4 - 1
[Request] Creative Writing Benchmark
#35 opened by Abdulhanan535 - 1
- 1
- 2
Paper on creative writing benchmark
#33 opened by AriMKatz - 1
Trying to get to the bottom of why `Qwen1.5-110B-Chat` scores so much higher than the `command-r` models
#32 opened by jukofyork - 5
Offload to the cpu
#29 opened by djstrong - 7
Benchmark Failed
#28 opened by djstrong - 2
Error in calculating revise answer score
#25 opened by impact-rm - 3
Contributing to OpenCompass
#24 opened by bittersweet1999 - 3
Contributing with other judges
#23 opened by Krisseck - 50
Backend changes scores significantly
#16 opened by dnhkng - 26
model test request
#20 opened by dnhkng - 1
Passing in model_kwargs
#22 opened by derpyplops - 1
default judge model setting for the leaderboard
#21 opened by gyin94 - 2
- 2
Input length of input_ids is 1211, but `max_length` is set to 1000. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`. Benchmark run failed
#14 opened by Abdullah-kwl - 4
- 12
v2 outputs
#4 opened by gblazex - 2
Support for Seq2Seq LMs
#3 opened by CarlsVoca - 1
Add Claude
#2 opened by tekumara - 2
The prompt to generate the dialogue.
#1 opened by GorgeousWang