One question about DDP
cswwp opened this issue · 2 comments
@richardkxu Nice repo. One question: Is there difference between Single node, multiple GPUs with torch.distributed.launch (①) and Single node, multiple GPUs with multi-processes(②)? or they are equalize, and just two different method?
The main difference is which distributed training library you use. The 1st one uses NVIDIA Apex library. The 2nd one uses torch.nn.DistributedDataParallel
. The 1st one gives better perf and works better with NVIDIA GPUs. It also becomes the default way in newer version of pytorch (> 1.6.0). Hope this is helpful!
The main difference is which distributed training library you use. The 1st one uses NVIDIA Apex library. The 2nd one uses
torch.nn.DistributedDataParallel
. The 1st one gives better perf and works better with NVIDIA GPUs. It also becomes the default way in newer version of pytorch (> 1.6.0). Hope this is helpful!
Thank you, very helpful