Get started

To get started:

create .env with OPENAI_API_KEY

yarn install

node evals.js

The eval in eals/eval-001 will be run ten times. The results will be saved to ./output.

Eval structure

Each eval contains:

Using integration tests, you can automate the testing of your agent or run Monte Carlo simulations.

AgentEval.Demo.mp4