tjunlp-lab/Awesome-LLMs-Evaluation-Papers

The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.

Issues

The GitHub repo of OpenEval is inaccessible...
#24 opened a year ago by zhimin-z
3
The leaderboard is missing from the page...
#27 opened 2 months ago by zhimin-z
1
Which metrics is chosen in the leaderboard?
#21 opened a year ago by zhimin-z
0
Add License
#29 opened 6 months ago by haesleinhuepf
0
Why we list inaccessible benchmark?
#14 opened a year ago by zhimin-z
3
Any paper or report for http://openeval.org.cn?
#26 opened 10 months ago by zhimin-z
0
What is the provenance of WGlaw dataset?
#25 opened a year ago by zhimin-z
0
How many shots are used to evaluate the benchmarks in OpenEval?
#23 opened a year ago by zhimin-z
0
Code-Related Benchmarks
#11 opened a year ago by john-b-yang
2
Could you add PandaLM to your survey?
#18 opened a year ago by qianlanwyd
1
Can you add our recent work to your survey?
#10 opened a year ago by grayground
1
SeaEval: Multilingual LLM Evaluation
#7 opened a year ago by BinWang28
7
Add SpyGame
#9 opened a year ago by Skytliang
1
Add "How Important are Good Method Names in Neural Code Generation? A Model Robustness Perspective." in Robustness Evaluation
#3 opened a year ago by NTDXYG
1
RAGAS: Automated Evaluation of Retrieval Augmented Generation
#1 opened a year ago by gdelpuente
2