torch_DDP: A Python repository from DragonChen-TW

EntryPoint

python3 start.py

Decide your world size.

python3 mnist_train.py 0 {size}
python3 mnist_train.py 1 {size}
...
python3 mnist_train.py {size - 1} {size}

e.g. world size is 2
In mahcine 1:
python3 mnist_train.py 0 2

In machine 2:
python3 mnist_train.py 1 2

Inside def setup():

'nccl' => other options: 'gloo', 'nccl', 'mpi'

IP adress to listen

Inside def demo_basic():

you can set gpu_rank is a constant or a mapping results

supported model: resnet18(), resnet34(), resnet50()

data = get_mnist('~/data')
path to save saved MNIST