Leaderboard of JudgLM evaluations
Opened this issue · 2 comments
sachith-surge commented
I have evaluated one of my model using your JudgeLM 13B model.
How do I benchmark this against other models for comparison?
deshwalmahesh commented
Can you share the .jsonl
files examples you used? @sachith-surge
sachith-surge commented
I used this judgelm-val-5k-judge-samples.jsonl file to evaluate my model.