Bro,what's the difference between the `rank` and `local_rank`?
buaacarzp opened this issue · 2 comments
buaacarzp commented
distribuuuu/tutorial/mnmc_ddp_launch.py
Lines 23 to 24 in c8c17dc
BIGBALLON commented
- Reference:
- https://pytorch.org/tutorials/beginner/dist_overview.html
- https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md
- https://zhuanlan.zhihu.com/p/360405558
- Explanation
Assume you have 4 machines with 32 GPUs (8GPUs per machine)
world_size
is 32rank
are 0 ~ 31local_rank
are 0 ~ 7, since one machine has only 8 GPUs, so you need use the device number 0 ~ 7
distribuuuu/tutorial/mnmc_ddp_launch.py
Line 35 in c8c17dc
I suggest you read dist_overview first.
buaacarzp commented
- Reference:
- https://pytorch.org/tutorials/beginner/dist_overview.html
- https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md
- https://zhuanlan.zhihu.com/p/360405558
- Explanation
Assume you have 4 machines with 32 GPUs (8GPUs per machine)
world_size
is 32rank
are 0 ~ 31local_rank
are 0 ~ 7, since one machine have only 8 GPUs, so you need use the device number 0 ~ 7distribuuuu/tutorial/mnmc_ddp_launch.py
Line 35 in c8c17dc
I suggest you read dist_overview first.
goog job, I like you very much !