prometheus-eval/prometheus
[ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on a customized score rubric, Prometheus is a good alternative for human evaluation and GPT-4 evaluation.
PythonMIT
Issues
- 1
How to evaluate HHH, MT_Bench_human? Where to get human scores for other val sets?
#19 opened by deshwalmahesh - 0
Need to change organization name from `kaist-ai` to `prometheus-eval` for code, docs, and README.md
#18 opened by scottsuk0306 - 0
- 0
ood_test missing some gpt4 feedback
#16 opened by se-ok - 1
Version Issue for BetterTransformer. Please provide exact package dependencies and Python, Torch version you used
#15 opened by deshwalmahesh - 1
Prometheus using no reference materials
#14 opened by maurovitaleBH - 1
Demo of Prometheus
#13 opened by zhao1402072392 - 4
Unable to generate evaluation
#12 opened by HuihuiChyan - 3
- 1
Question About Feedback Bench
#11 opened by gmftbyGMFTBY - 1
Grad clipping for fp16
#10 opened by nnethercott - 1
- 1
- 4
Question about the supported context length
#6 opened by shaoyijia - 1
Question about the dataset
#4 opened by WoutDeRijck - 1
Can you provide an example of running the model? I am not able to get feedback.
#2 opened by sungkim11 - 0
score_completions() doesn't work
#3 opened by ChiaraOleary