Multi-GPU Training with PyTorch and TensorFlow

About

This workshop provides demostrations of multi-GPU training for PyTorch Distributed Data Parallel (DDP) and PyTorch Lightning. Multi-GPU training in TensorFlow is demonstrated using MirroredStrategy.

Setup

Make sure you can run Python on Adroit:

$ ssh <YourNetID>@adroit.princeton.edu  # VPN required if off-campus
$ git clone https://github.com/PrincetonUniversity/multi_gpu_training.git
$ cd multi_gpu_training
$ module load anaconda3/2021.11
(base) $ python --version
Python 3.9.7

Getting Help

If you encounter any difficulties with the material in this guide then please send an email to cses@princeton.edu or attend a help session.

Authorship

This guide was created by Jonathan Halverson and members of PICSciE and Research Computing.