Tencent/Real-SR

erro in multi-GPU training

wytcsuch opened this issue · 1 comments

when I train with multiply GPU,an erro occur:
File "/wytdata/Real-SR/codes/models/SRGAN_model.py", line 74, in __init__ model = create_model(opt) File "/wytdata/Real-SR/codes/models/__init__.py", line 14, in create_model m = M(opt) File "/wytdata/Real-SR/codes/models/SRGAN_model.py", line 74, in __init__ device_ids=[torch.cuda.current_device()]) File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 238, in __init__ device_ids=[torch.cuda.current_device()]) File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 238, in __init__ "DistributedDataParallel is not needed when a module " AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient. "DistributedDataParallel is not needed when a module " AssertionError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

The cause of the error appears to be to stop updating the parameters of VGG,Is there any solution?
if self.cri_fea: # load VGG perceptual loss
self.netF = networks.define_F(opt, use_bn=False).to(self.device)
if opt['dist']:
self.netF = DistributedDataParallel(self.netF,device_ids=[torch.cuda.current_device()]) #erro occurs here
else:
self.netF = DataParallel(self.netF)

Do you solve this problem? I also have the same issue.