MLCommons™ AlgoPerf: Training Algorithms Benchmark

🏆 Leaderboard • 🚀 Getting Started • 📥 Submit • 📖 Docs/Rules
📜 Benchmark Paper • 📊 Results Paper

The MLCommons™ AlgoPerf: Training Algorithms benchmark is designed to find training algorithms that can train neural networks faster by rigorously measuring how quickly they reach a specific performance target across a diverse set of deep learning workloads.

When training neural nets, practitioners face many critical yet often opaque decisions: What optimizer to choose? How should its learning rate be tuned? What learning rate schedule should be used? These choices can make or break training, yet the community has lacked a clear, standardized way to identify the state of the art. Unlike benchmarks focused on hardware or model architecture, AlgoPerf isolates the training algorithm itself, which includes the optimizer, regularization, data selection, and hyperparameters like the learning rate schedule. By standardizing the benchmark process, AlgoPerf offers a meaningful apples-to-apples comparison of training algorithms and follows the following key principles:

🎯 Fixed Target, Model & Hardware: Submitted training algorithms must train a set of fixed models to a pre-defined validation performance target as fast as possible. All submissions use the same model architecture and are run on the same standardized hardware (8x NVIDIA V100 GPUs). This isolates the training algorithm's performance and allows a fair apples-to-apples comparison.
⏱️ Time-To-Result: Submissions are evaluated based on the total wall-clock time required to reach the target, rewarding practical and efficient algorithms.
🧠 Diverse Workloads: The benchmark includes 8 diverse deep learning workloads across domains like image classification, speech recognition, and machine translation. A submission's score is computed by aggregating its performance, using performance profiles, across all workloads to ensure general-purpose algorithms.
📦 Fully-Specified Algorithms: Submissions must be complete procedures and thus hyperparameter tuning is treated as part of the algorithm. Submissions can either provide a search space for automated tuning (External tuning ruleset) or be hyperparameter-free (Self-tuning ruleset) with any tuning done automatically and "on the clock". This measures an algorithm's total practical cost and provides practitioners with a complete method, eliminating the guesswork of how to apply it.

Important

We have moved to a rolling leaderboard! We invite you to submit your algorithm for evaluation, see our How to Submit section and the submission repository. The working group will review your submission and, if selected, run it on our hardware and add your results to the official AlgoPerf Leaderboard. Note: we are currently focusing our efforts on the self-tuning leaderboard to strengthen its competitiveness.

Getting Started
How to Submit
Rules, Documentation & FAQ
Contributing & Resources
Releases & Roadmap
Training Algorithm Collection
Citing Our Work
License

Getting Started

Follow these steps to run a baseline algorithm and start developing your own submission. A more detailed guide can be found in the Getting Started document. If you run into any issues, please feel free to contact us. Either file an issue, ask a question on our Discord or join our weekly meetings.

Installation

We recommend using the provided Docker container to ensure a reproducible environment similar to our scoring environment. Alternatively, you can install the package and its dependencies in a Python virtual environment. Both options are described in more detail in the Getting Started document.

TL;DR Install JAX version for GPU (with workload dependencies):

pip3 install -e '.[pytorch_cpu,jax_gpu,full]' --extra-index-url https://download.pytorch.org/whl/cpu

TL;DR Install PyTorch version for GPU (with workload dependencies):

pip3 install -e '.[jax_cpu,pytorch_gpu,full]'

Run a Workload

Use the submission_runner.py to run an experiment, i.e., train a workload using a specific training algorithm. Here's how to run the AdamW baseline on the mnist workload.

TL;DR: Running a JAX workload:

python3 submission_runner.py \
    --framework=jax \
    --workload=mnist \
    --experiment_dir=$HOME/experiments \
    --experiment_name=my_first_experiment \
    --submission_path=algorithms/archived_paper_baselines/adamw/jax/submission.py \
    --tuning_search_space=algorithms/archived_paper_baselines/adamw/tuning_search_space.json

TL;DR: Running a PyTorch workload:

python3 submission_runner.py \
    --framework=pytorch \
    --workload=mnist \
    --experiment_dir=$HOME/experiments \
    --experiment_name=my_first_experiment \
    --submission_path=algorithms/archived_paper_baselines/adamw/pytorch/submission.py \
    --tuning_search_space=algorithms/archived_paper_baselines/adamw/tuning_search_space.json

Develop Your Algorithm

Now you're ready to create your own submission.py! For detailed instructions, FAQs, and technical details, please refer to our documentation:

Getting Started Guide: A detailed walkthrough for developing your algorithm.
Benchmark Documentation: The complete technical reference including the "benchmark rules" such as allowed and disallowed submissions, FAQs, and technical details such as the API.

How to Submit

Ready to see how your algorithm stacks up? Submit it to the official AlgoPerf leaderboard!

Develop Your Algorithm: Create your training algorithm following the API and "rules" described in our documentation.
Create a Pull Request: Fork the submissions repository and create a pull request with your algorithm.
Review and Evaluation: The MLCommons Algorithms Working Group will review your PR. Based on its potential and our available resources, it may be selected for a free, official evaluation on our hardware.
See Your Results: If selected, we will run your algorithm and add the results to the public leaderboard.

Rules, Documentation & FAQ

We provide a technical documentation of the benchmark and answer frequently asked questions regarding the benchmarking protocol in a dedicated Documentation page. This includes which types of submissions are allowed, a description of the benchmark API, and the entire benchmarking protocol. Please ensure that your submission is compliant with these rules before submitting. Suggestions, clarifications, and questions can be raised via pull requests, by creating an issue, or by reaching out to the working group.

For a detailed description and motivation of the initial benchmark design, please refer to our Benchmark Paper. For the results of the first AlgoPerf competition, please refer to our Competition Results Paper. See our AlgoPerf Leaderboard for the latest results of the benchmark and the option to submit your algorithm.

Contributing & Resources

AlgoPerf is an open, community-driven project organized by the MLCommons Algorithms Working Group. Whether you want to submit an algorithm, report a bug, or help shape the future of the benchmark, we welcome your contributions.

🏆 Submit Your Algorithm: Ready to compete? Create a pull request in the Submissions Repository.
🐞 Report a Bug: Found an issue with the codebase? Please file an issue so we can take a look. This also includes any rules changes or clarifications you would like to see.
🛠️ Contribute to the Codebase: We actively welcome pull requests! If you're interested in implementing new workloads, adding baselines, or fixing bugs please reach out to us. Our Contributing Guide offers further contributing guidelines and additional setup and workflow instructions.
👥 Influence the Benchmark: To contribute to the benchmark's design and direction, please join the weekly working group meetings.
💬 Ask a Question: Have a question or want to discuss ideas? Join the conversation on our Discord Server or join our weekly meetings.

Releases & Roadmap

The AlgoPerf benchmark is an actively evolving project designed to keep pace with the rapidly changing field of machine learning. To ensure clarity and reproducibility, we have adopted a unified versioning system: codebase, rules, and leaderboard all share the same Major.Minor version. Patch versions may differ for minor updates. All results produced under the same Major.Minor version are comparable, making it easy to cite "AlgoPerf v0.X" and know exactly which set of rules, code, and submissions are being referenced.

Here is an overview of our key releases and the future roadmap. For a detailed list of changes in each release, see our Changelog.

v0.5 - Inaugural Competition
The benchmark as it was run for the first AlgoPerf competition in 2024. The key findings and analysis from this competition are detailed in our ICLR 2025 Results Paper. It serves as a historical reference.
- Leaderboard: Archived at AlgoPerf v0.5 Leaderboard.
- Rules: The rules are archived at the AlgoPerf v0.5 Documentation.
v0.6 - Current Version
The active and recommended version of the benchmark. It is an improved and streamlined version that fixes important bugs and modifying the benchmarking protocol based on the lessons learned from the competition. This is the recommended version for all new submissions.
- Key Changes: (see the Changelog for details, including links to discussions on rule changes.)
  - A rolling leaderboard now allows for continuous submissions and updates.
  - Reduced computational cost via removing held-out workloads, 3 repetition studies (down from 5), and adjusted runtime budgets.
  - Includes important bug fixes (e.g., batch norm) and API improvements (e.g., prepare_for_eval function).
  - Migrating from pmap to jit in JAX for better performance and scalability.
- Leaderboard: The active (but currently limited) leaderboard can be found at AlgoPerf v0.6 Leaderboard.
- Rules: For the current set of rules see AlgoPerf v0.6 Documentation.

🏗️ v1.0 (Future) - Planned Long-Term Support Release
This will be the next major release of the benchmark and a "long-term support" version, with the following anticipated features:

Adding a new language model (LM) workload.

Stronger baselines, especially for the self-tuning leaderboard.

Training Algorithm Collection

This repository also provides a collection of implemented training algorithms with different purposes. These include submission templates, development examples, target-setting algorithms, historical baselines, and current baselines. For a detailed overview of these algorithms and their organization, please refer to the algorithms/README.md file. You can also find all benchmark submissions and their results on the official Leaderboard. These algorithms provide a starting point for developing your own training algorithm and are a great resource for understanding the AlgoPerf benchmark and its API.

Citing Our Work

If you use the AlgoPerf benchmark, its codebase, or results in your research, please cite our papers.

Benchmark Paper:

In this paper, we motivate, describe, and justify the AlgoPerf: Training Algorithms benchmark.

Dahl, Schneider, Nado, et al.
> Benchmarking Neural Network Training Algorithms
> arXiv 2306.07179

@Misc{Dahl2023AlgoPerf,
  title         = {{Benchmarking Neural Network Training Algorithms}},
  author        = {Dahl, George E. and Schneider, Frank and Nado, Zachary and Agarwal, Naman and Sastry, Chandramouli Shama and Hennig, Philipp and Medapati, Sourabh and Eschenhagen, Runa and Kasimbeg, Priya and Suo, Daniel and Bae, Juhan and Gilmer, Justin and Peirson, Abel L. and Khan, Bilal and Anil, Rohan and Rabbat, Mike and Krishnan, Shankar and Snider, Daniel and Amid, Ehsan and Chen, Kongtao and Maddison, Chris J. and Vasudev, Rakshith and Badura, Michal and Garg, Ankush and Mattson, Peter},
  year          = {2023},
  archiveprefix = {arXiv},
  eprint        = {2306.07179},
}

Competition Results Paper:

In this paper, we analyze the results of the first AlgoPerf competition.

Kasimbeg, Schneider, Eschenhagen, et al.
> Accelerating neural network training: An analysis of the AlgoPerf competition
> ICLR 2025

@inproceedings{Kasimbeg2025AlgoPerfResults,
title           = {Accelerating neural network training: An analysis of the {AlgoPerf} competition},
author          = {Kasimbeg, Priya and Schneider, Frank and Eschenhagen, Runa and Bae, Juhan and Sastry, Chandramouli Shama and Saroufim, Mark and Boyuan, Feng and Wright, Less and Yang, Edward Z. and Nado, Zachary and Medapati, Sourabh and Hennig, Philipp and Rabbat, Michael and Dahl, George E.},
booktitle       = {The Thirteenth International Conference on Learning Representations},
year            = {2025},
url             = {https://openreview.net/forum?id=CtM5xjRSfm}
}

License

The AlgoPerf codebase is licensed under the Apache License 2.0. All AlgoPerf benchmark submissions must likewise be open-source under the same Apache License 2.0.

MLCommons™ Algorithms Working Group • Join us!

mlcommons/algorithmic-efficiency