LangChain Benchmarks
This repository shows how we benchmark some of our more popular chains and agents. The benchmarks are organized by end-to-end use cases. They utilize LangSmith heavily.
We have several goals in open sourcing this:
- Showing how we collect our benchmark datasets for each task
- Showing what the benchmark datasets we use for each task is
- Showing how we evaluate each task
- Encouraging others to benchmark their solutions on these tasks (we are always looking for better ways of doing things!)
We currently include the following tasks: