tjunlp-lab/Awesome-LLMs-Evaluation-Papers
The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.
Issues
- 3
The GitHub repo of OpenEval is inaccessible...
#24 opened by zhimin-z - 1
The leaderboard is missing from the page...
#27 opened by zhimin-z - 0
Which metrics is chosen in the leaderboard?
#21 opened by zhimin-z - 0
Add License
#29 opened by haesleinhuepf - 3
Why we list inaccessible benchmark?
#14 opened by zhimin-z - 0
Any paper or report for http://openeval.org.cn?
#26 opened by zhimin-z - 0
What is the provenance of WGlaw dataset?
#25 opened by zhimin-z - 0
- 2
Code-Related Benchmarks
#11 opened by john-b-yang - 1
Could you add PandaLM to your survey?
#18 opened by qianlanwyd - 1
Can you add our recent work to your survey?
#10 opened by grayground - 7
SeaEval: Multilingual LLM Evaluation
#7 opened by BinWang28 - 1
Add SpyGame
#9 opened by Skytliang - 1
Add "How Important are Good Method Names in Neural Code Generation? A Model Robustness Perspective." in Robustness Evaluation
#3 opened by NTDXYG - 2