llm-as-evaluator

There are 9 repositories under llm-as-evaluator topic.

prometheus-eval/prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
Language:Python893 3 3855
JohnSnowLabs/langtest
Deliver safe & effective language models
Language:Python516 8 46944
IAAR-Shanghai/xFinder
[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
Language:Python159 4 117
KID-22/LLM-IR-Bias-Fairness-Survey
This is the repo for the survey of Bias and Fairness in IR with LLMs.
52 3 03
zhaochen0110/Timo
Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)
Language:Python20 1 12
minnesotanlp/cobbler
Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"
Language:Jupyter Notebook19 2 02
HillPhelmuth/LlmAsJudgeEvalPlugins
LLM-as-judge evals as Semantic Kernel Plugins
Language:C#6 1 01
djokester/groqeval
Use groq for evaluations
Language:Python2 2 60
rafaelsandroni/antibodies
Antibodies for LLMs hallucinations (grouping LLM as a judge, NLI, reward models)
Language:Python1 0