prometheus-eval/prometheus

[ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically designed for fine-grained evaluation on a customized score rubric, Prometheus is a good alternative for human evaluation and GPT-4 evaluation.

PythonMIT

Issues

How to evaluate HHH, MT_Bench_human? Where to get human scores for other val sets?
#19 opened a year ago by deshwalmahesh
1
Need to change organization name from `kaist-ai` to `prometheus-eval` for code, docs, and README.md
#18 opened a year ago by scottsuk0306
0
Is a trained prometheus model available for use?
#17 opened a year ago by ethvedbitdesjan
0
ood_test missing some gpt4 feedback
#16 opened a year ago by se-ok
0
Version Issue for BetterTransformer. Please provide exact package dependencies and Python, Torch version you used
#15 opened a year ago by deshwalmahesh
1
Prometheus using no reference materials
#14 opened a year ago by maurovitaleBH
1
Demo of Prometheus
#13 opened a year ago by zhao1402072392
1
Unable to generate evaluation
#12 opened a year ago by HuihuiChyan
4
A functional command example for model training (also on a single GPU)?
#7 opened a year ago by mqo00
3
Question About Feedback Bench
#11 opened a year ago by gmftbyGMFTBY
1
Grad clipping for fp16
#10 opened a year ago by nnethercott
1
score rubric label for feedback_collection_test.json
#9 opened a year ago by je1lee
1
Evaluation Code for AlpacaFarm, FLASK, MT-Bench
#8 opened a year ago by Haoxiang-Wang
1
Question about the supported context length
#6 opened a year ago by shaoyijia
4
Question about the dataset
#4 opened a year ago by WoutDeRijck
1
Can you provide an example of running the model? I am not able to get feedback.
#2 opened a year ago by sungkim11
1
score_completions() doesn't work
#3 opened a year ago by ChiaraOleary
0