there are some wrong when train model on two GPUs with DataParallel

Question

there are some wrong when train model on two GPUs with DataParallel

Closed this issue 4 years ago · 1 comments

Loading model cost 0.516 seconds.
Prefix dict has been built successfully.
Loading model cost 0.512 seconds.
Prefix dict has been built successfully.
Traceback (most recent call last):
File "train.py", line 310, in
main()
File "train.py", line 98, in main
epoch=epoch)
File "train.py", line 153, in train
scores, caps_sorted, decode_lengths, alphas, sort_ind = decoder(imgs, caps, caplens)
File "/home/•/anaconda3/envs/Pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/•/anaconda3/envs/Pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 156, in forward
return self.gather(outputs, self.output_device)
File "/home/•/anaconda3/envs/Pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 168, in gather
return gather(outputs, output_device, dim=self.dim)
File "/home/•/anaconda3/envs/Pytorch/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather

Answer 1 · 2020-11-25T08:49:40.000Z

You are right, I've not figure out data parallel settings so far, feel free to create a PR if you've time.