Distributed Training Workshop on Amazon SageMaker

Welcome to the art and science of optimizing neural networks at scale! In this workshop you'll get hands-on experience working with our high performance distributed training libraries to achieve the best performance on AWS.

Workshop Content

Today you'll walk through two hands-on labs. The first one focuses on data parallelism, and the second one is about model parallelism.

Prerequisites

This lab is self-contained. All of the content you need is produced by the notebooks themselves or included in the directory. However, if you are in an AWS-led workshop you will most likely use the Event Engine to manage your AWS account.

If not, please make sure you have an AWS account with a SageMaker Studio domain created. In this account please request a service limit increase for the ml.g4dn.12xlarge instance type within SageMaker training.

Top papers and case studies

Some relevant papers for your reference:

SageMaker Data Parallel, aka Herring. In this paper we introduce a custom high performance computing configuration for distributed gradient descent on AWS, available within Amazon SageMaker Training.
SageMaker Model Parallel. In this paper we propose a model parallelism framework available within Amazon SageMaker Training to reduce memory errors and enable training GPT-3 sized models and more! See our case study achieving 32 samples / second with 175B parameters on SageMaker over 140 p4d nodes.
Amazon Search speeds up training by 7.3x on SageMaker. In this blog post we introduce two new features on Amazon SageMaker: support for native PyTorch DDP and PyTorch Lightning integration with SM DDP. We also discuss how Amazon Search sped up their overall training time by 7.3x by moving to distributed training.

Upcoming book

If you'd like to read my upcoming book on the topic, check it out on Amazon here!. It's coming out April 2023.

SaqlainHussainShah/sagemaker-distributed-training-workshop

Distributed Training Workshop on Amazon SageMaker

Workshop Content

Prerequisites

Other helpful links

Top papers and case studies

Upcoming book