/BandWidth_Test

Test the GPU bandwidth of collectives operators like all-reduce, all-gather, broadcast and all-to-all primitives on single-node multi-GPU (2, 4, 8 cards) and multi-node multi-GPU (16 cards) setups, using only PyTorch and Python built-in packages.

Primary LanguagePython

BandWidth_Test

Test the GPU bandwidth of collectives operators like all-reduce, all-gather, broadcast and all-to-all primitives on single-node multi-GPUs (2, 4, 8 cards) and multi-nodes multi-GPUs (16 cards) setups, using only PyTorch and Python built-in packages.

Single-node multi-GPUs: You can use torchrun --standalone --nproc_per_node=2 test_bandwidth.py or use the shell script run_test_bandwidth.sh.

img_v3_027b_9455c6ff-991a-4310-8f2e-e07bf84a217g img_v3_027b_50e90d59-f163-4beb-bfda-eb5e9557e4bg img_v3_027b_236ccabe-af10-4343-96f0-feeb153c08ag

Multi-nodes multi-GPUs:

Run run_test_bandwidth.sh script. For example, run command sh run_test_bandwidth.sh 2 2 0 10.20.1.81 22 on the first node, and run command sh run_test_bandwidth.sh 2 2 1 10.20.1.81 22 on the second node.

The $NNODES represents the machine number you want to use and $NODE_RANK represents the gpus number you want to use per node machine.

img_v3_027b_44e5f429-6e6f-4912-8151-aaa6825030cg img_v3_027b_484d8993-21bc-4671-b624-2a8b6e1cf58g