/model-as-a-judge-eval

Notebooks for evaluating LLM based applications using the Model (LLM) as a judge pattern.

Primary LanguageJupyter NotebookMIT No AttributionMIT-0

Model As A Judge Eval Patterns

In this repo, we will examine different techniques of using a model (LLM) as a judge when evaluating LLM based solutions.

Content in this repo

We will various types of evals. The datasets provided were all synthetically generated using LLMs and are located in the /data directory. The evals are in jupyter notebooks and are located in the /eval directory.

Types of Evals

  1. 00_basic_chat_evaluation: This eval consumes a chat conversation from a chatbot/agent and evaluates it based on a rubric. The purpose of this notebook is to demonstrate a basic eval pattern. In subsequent evaluations, we'll go through some more advanced evals for comparing chats from different models.

More evaluations coming soon.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.