chou141253/FGVC-HERBS

how to train on multiple GPU on a single machine

Closed this issue · 1 comments

Thanks a lot for your outstanding work.

I commented off this line to use multiple GPU on a single machine.

#model = torch.nn.DataParallel(model, device_ids=None) # device_ids : None --> use all gpus.

But it gave error:

Start Training 1 EpochTraceback (most recent call last):
File "main.py", line 325, in
main(args, tlogger)
File "main.py", line 277, in main
train(args, epoch, model, scaler, amp_context, optimizer, schedule, train_loader)
File "main.py", line 158, in train
thres = torch.Tensor(model.selector.thresholds[aux_name])
File "/home/wtan@BII.local/anaconda3/envs/timm_faiss/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1270, in getattr
type(self).name, name))
AttributeError: 'DataParallel' object has no attribute 'selector'

Do you have a suggestion how to correct this error?

I am training on a custom dataset with 1M images.
Thanks a lot again.

Never mind. It is a very minor issue, all I need to do is add module into it.