/agenteval

Automated testing and benchmarking for code generation agents.

Primary LanguageJavaScript

Get started

To get started:

create .env with OPENAI_API_KEY

yarn install

node evals.js

The eval in eals/eval-001 will be run ten times. The results will be saved to ./output.

Eval structure

Each eval contains:

  • app: The codebase before transformation.
  • prompt.py: A description of the transformation to be made.
  • solution: The canonical solution with the complete codebase transformed.

Purpose

Using integration tests, you can automate the testing of your agent or run Monte Carlo simulations.

AgentEval

Demo

AgentEval.Demo.mp4