Welcome to the reproducible benchmark recipes repository for GPUs! This repository contains recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.
- Identify the model: Determine the model, GPU type, workload, framework, and orchestrator you are interested in.
- Select a recipe: Refer to the Support Matrix and find the recipe that matches your needs.
- Prepare your environment: Each recipe has instructions on setting up environment to run the benchmark.
- Run the benchmark: Follow the steps in the recipe to execute the benchmark.
- Analyze the results: At the end of the benchmark run, you'll get the resultant metrics and detailed logs for further analysis.
Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe |
---|---|---|---|---|---|
GPT3-175B | A3 Mega (NVIDIA H100) | NeMo | Pre-training | GKE | Link |
- training/: Contains recipes to reproduce training benchmarks with GPUs.
- src/: Contains shared dependencies required to run benchmarks, such as docker files, helm charts.
- docs/: Contains documentation referred to in the recipes, such as explanation of benchmark methodologies or configurations.
If you have any questions or if you found any problems with this repository, please report through GitHub issues.
This is not an officially supported Google product. The code in this repository is for demonstrative purposes only.