baaivision/JudgeLM

Leaderboard of JudgLM evaluations

Opened this issue · 2 comments

I have evaluated one of my model using your JudgeLM 13B model.

How do I benchmark this against other models for comparison?

Can you share the .jsonl files examples you used? @sachith-surge

I used this judgelm-val-5k-judge-samples.jsonl file to evaluate my model.