EQ-bench/EQ-Bench

A benchmark for emotional intelligence in large language models

PythonMIT

Issues

ooba not parsing in .cfg
#37 opened 4 months ago by zisismp4
6
[Request] Creative Writing Benchmark
#35 opened 4 months ago by Abdulhanan535
1
New Command R 08-2024 and Command R+ 08-2024 models
#34 opened 4 months ago by jukofyork
1
Request to evaluate the new O1 models by OpenAI (O1-preview and O1-mini)
#36 opened 4 months ago by Belzedar94
1
Paper on creative writing benchmark
#33 opened 5 months ago by AriMKatz
2
Trying to get to the bottom of why `Qwen1.5-110B-Chat` scores so much higher than the `command-r` models
#32 opened 5 months ago by jukofyork
1
Offload to the cpu
#29 opened 6 months ago by djstrong
5
Benchmark Failed
#28 opened 7 months ago by djstrong
7
Error in calculating revise answer score
#25 opened 7 months ago by impact-rm
2
Contributing to OpenCompass
#24 opened 8 months ago by bittersweet1999
3
Contributing with other judges
#23 opened 9 months ago by Krisseck
3
Backend changes scores significantly
#16 opened 9 months ago by dnhkng
50
model test request
#20 opened 9 months ago by dnhkng
26
Passing in model_kwargs
#22 opened 9 months ago by derpyplops
1
default judge model setting for the leaderboard
#21 opened 9 months ago by gyin94
1
'BitsAndBytesConfig' object has no attribute 'get_loading_attributes'
#19 opened 10 months ago by Abdullah-kwl
2
Input length of input_ids is 1211, but `max_length` is set to 1000. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`. Benchmark run failed
#14 opened a year ago by Abdullah-kwl
2
Add some of the new 100B+ models to the leaderboard
#5 opened a year ago by cosmojg
4
v2 outputs
#4 opened a year ago by gblazex
12
Support for Seq2Seq LMs
#3 opened a year ago by CarlsVoca
2
Add Claude
#2 opened a year ago by tekumara
1
The prompt to generate the dialogue.
#1 opened a year ago by GorgeousWang
2