spock74/langchain-benchmarks

Python

LangChain Benchmarks

This repository shows how we benchmark some of our more popular chains and agents. The benchmarks are organized by end-to-end use cases. They utilize LangSmith heavily.

We have several goals in open sourcing this:

Showing how we collect our benchmark datasets for each task
Showing what the benchmark datasets we use for each task is
Showing how we evaluate each task
Encouraging others to benchmark their solutions on these tasks (we are always looking for better ways of doing things!)

We currently include the following tasks: