Simple utility for testing the throughput of torch.distributed connections.
torchrun --nnodes 1 --nproc-per-node 8 bench.py --iterations 1000
torchrun --nnodes 1 --nproc-per-node 8 bench.py --iterations 1000 --backend nccl
torchrun --nnodes 4 --nproc-per-node 8 bench.py --iterations 1000 --backend nccl
Depending on how your set-up is configured, you may need to write your own bench() function to properly configure your torch environment.