lucidrains/DALLE-pytorch

how to set multi-gpu without deepspeed?

carol007 opened this issue · 1 comments

when i use run the shell code with OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=2 train_vae.py I find only gpu0 can work with two threads. But I want to use os.environ["CUDA_VISIBLE_DEVICES"]="0,1" to set two_gpus. It seems there is no place for args.gpus to set distr_backend

Hi, you have two options:

  1. Use one of the supported distributed backends, DeepSpeed or Horovod.
  2. Write your own for torch.distributed by extending distributed_backend.py. If the documentation is not clear enough, I can improve it and help you.

My personal recommendation: Try Horovod if you don't want to use DeepSpeed.