how to set multi-gpu without deepspeed?
carol007 opened this issue · 1 comments
carol007 commented
when i use run the shell code with OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=2 train_vae.py
I find only gpu0 can work with two threads. But I want to use os.environ["CUDA_VISIBLE_DEVICES"]="0,1" to set two_gpus. It seems there is no place for args.gpus to set distr_backend
janEbert commented
Hi, you have two options:
- Use one of the supported distributed backends, DeepSpeed or Horovod.
- Write your own for
torch.distributed
by extendingdistributed_backend.py
. If the documentation is not clear enough, I can improve it and help you.
My personal recommendation: Try Horovod if you don't want to use DeepSpeed.