TaatiTeam/MotionAGFormer

The issue of training occupying two GPUs

yaoyao674 opened this issue · 4 comments

Hello author, your work is very good.

What I would like to ask is to execute Python train.py -- config configs/h36m/MotionAGFormer xsmall. yaml. I have observed that it will be trained on two GPUs. Is this normal? I did not find any location in the code where the number of GPUs is set. Can you point it out? Thank you

device = 'cuda' if torch.cuda.is_available() else 'cpu'
   model = load_model(args)
   if torch.cuda.is_available():
      model = torch.nn.DataParallel(model)
   model.to(device)

Lines 256--260 in the file train.py You could set the parameters of DP as follow:

  if torch.cuda.is_available():
     model = torch.nn.DataParallel(model,device_ids=[0,1])

You could set the device_ids = [0,1,2,......] or whatever in your GPUs list

Thanks @AsukaCamellia for answering it. I close the issue.

Before you execute the train.py, you can specify the GPUs which you want to use, i.e. export CUDA_VISIBLE_DEVICES=0,1 which will run the program on your first and second gpu in the machine.
The other way to do the same thing:

CUDA_VISIBLE_DEVICES=0,1 Python train.py -- config configs/h36m/MotionAGFormer-xsmall. yaml

I suggest you shoud use the torch.nn.parallel.DistributedDataParallel other than torch.nn.DataParallel if you need to train the model on multiple GPUs.