The issue of training occupying two GPUs
yaoyao674 opened this issue · 4 comments
Hello author, your work is very good.
What I would like to ask is to execute Python train.py -- config configs/h36m/MotionAGFormer xsmall. yaml. I have observed that it will be trained on two GPUs. Is this normal? I did not find any location in the code where the number of GPUs is set. Can you point it out? Thank you
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = load_model(args)
if torch.cuda.is_available():
model = torch.nn.DataParallel(model)
model.to(device)
Lines 256--260 in the file train.py You could set the parameters of DP as follow:
if torch.cuda.is_available():
model = torch.nn.DataParallel(model,device_ids=[0,1])
You could set the device_ids = [0,1,2,......] or whatever in your GPUs list
Thanks @AsukaCamellia for answering it. I close the issue.
thanks your help @AsukaCamellia @SoroushMehraban
Before you execute the train.py, you can specify the GPUs which you want to use, i.e. export CUDA_VISIBLE_DEVICES=0,1 which will run the program on your first and second gpu in the machine.
The other way to do the same thing:
CUDA_VISIBLE_DEVICES=0,1 Python train.py -- config configs/h36m/MotionAGFormer-xsmall. yaml
I suggest you shoud use the torch.nn.parallel.DistributedDataParallel other than torch.nn.DataParallel if you need to train the model on multiple GPUs.