Hugging Face Benchmarks

A toolkit for evaluating benchmarks on the Hugging Face Hub

Hosted benchmarks

The list of hosted benchmarks is shown in the table below:

Benchmark	Description	Submission	Leaderboard
RAFT	A benchmark to test few-shot learning in NLP	`ought/raft-submission`	`ought/raft-leaderboard`
GEM	A large-scale benchmark for natural language generation	`GEM/submission-form`	`GEM/results`

Clone the repository and install the requirements:

git clone git@github.com:huggingface/hf_benchmarks.git
cd hf_benchmarks
pip install '.[dev]'