Model As A Judge Eval Patterns

In this repo, we will examine different techniques of using a model (LLM) as a judge when evaluating LLM based solutions.

Content in this repo

We will various types of evals. The datasets provided were all synthetically generated using LLMs and are located in the /data directory. The evals are in jupyter notebooks and are located in the /eval directory.

Types of Evals

00_basic_chat_evaluation: This eval consumes a chat conversation from a chatbot/agent and evaluates it based on a rubric. The purpose of this notebook is to demonstrate a basic eval pattern. In subsequent evaluations, we'll go through some more advanced evals for comparing chats from different models.

More evaluations coming soon.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

aws-samples/model-as-a-judge-eval

Model As A Judge Eval Patterns

Content in this repo

Types of Evals

Security

License