Open in Dev Containers

Evaluation of LLM-based applications

An implementation of the principles of evaluating LLM-based applications. This repository accompanies the blog post 'Steady the Course: Navigating the Evaluation of LLM-based Applications'.

💡 Check out the example notebook for an end-to-end illustration of the most important concepts (LLM app, test case, test properties and Evaluator), including the integration with MLflow.

🔑 Add your OpenAI API key to a file named openai_key in the root directory before running the notebook.

The image below shows the evaluation framework and illustrates an important feedback loop to improve your LLM app further. See the aforementioned blog post for more information. The scope of this repository is indicated by the green box.

Evaluation feedback loop

Using

Python package: to add and install this package as a dependency of your project, run poetry add llm-app-eval.

Contributing

Prerequisites
1. Set up Git to use SSH
  1. Generate an SSH key and add the SSH key to your GitHub account.
  2. Configure SSH to automatically load your SSH keys:
    cat << EOF >> ~/.ssh/config
    Host *
      AddKeysToAgent yes
      IgnoreUnknown UseKeychain
      UseKeychain yes
    EOF
2. Install Docker
  1. Install Docker Desktop.
3. Install VS Code or PyCharm
  1. Install VS Code and VS Code's Dev Containers extension. Alternatively, install PyCharm.
  2. Optional: install a Nerd Font such as FiraCode Nerd Font and configure VS Code or configure PyCharm to use it.
Development environments

The following development environments are supported:

  1. ⭐️ GitHub Codespaces: click on Code and select Create codespace to start a Dev Container with GitHub Codespaces.
  2. ⭐️ Dev Container (with container volume): click on Open in Dev Containers to clone this repository in a container volume and create a Dev Container with VS Code.
  3. Dev Container: clone this repository, open it with VS Code, and run Ctrl/⌘ + + PDev Containers: Reopen in Container.
  4. PyCharm: clone this repository, open it with PyCharm, and configure Docker Compose as a remote interpreter with the dev service.
  5. Terminal: clone this repository, open it with your terminal, and run docker compose up --detach dev to start a Dev Container in the background, and then run docker compose exec dev zsh to open a shell prompt in the Dev Container.
Developing
  • This project follows the Conventional Commits standard to automate Semantic Versioning and Keep A Changelog with Commitizen.
  • Run poe from within the development environment to print a list of Poe the Poet tasks available to run on this project.
  • Run poetry add {package} from within the development environment to install a run time dependency and add it to pyproject.toml and poetry.lock. Add --group test or --group dev to install a CI or development dependency, respectively.
  • Run poetry update from within the development environment to upgrade all dependencies to the latest versions allowed by pyproject.toml.
  • Run cz bump to bump the package's version, update the CHANGELOG.md, and create a git tag.