A toolkit for evaluating benchmarks on the Hugging Face Hub
The list of hosted benchmarks is shown in the table below:
Benchmark | Description | Submission | Leaderboard |
---|---|---|---|
RAFT | A benchmark to test few-shot learning in NLP | ought/raft-submission |
ought/raft-leaderboard |
GEM | A large-scale benchmark for natural language generation | GEM/submission-form |
GEM/results |
Clone the repository and install the requirements:
git clone git@github.com:huggingface/hf_benchmarks.git
cd hf_benchmarks
pip install '.[dev]'