AssertionError: Gather function not implemented for CPU tensors
whansk50 opened this issue · 1 comments
whansk50 commented
Hello,
I executed train.py with multiple GPUs (I setted with CUDA_VISIBLE_DEVICES=0,1
because I got double GPUs), and I got error :
Traceback (most recent call last):
File "train.py", line 275, in <module>
train()
File "train.py", line 191, in train
out = net(images)
File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
return self.gather(outputs, self.output_device)
File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 181, in gather
return gather(outputs, output_device, dim=self.dim)
File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 78, in gather
res = gather_map(outputs)
File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 73, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return Gather.apply(target_device, dim, *outputs)
File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/parallel/_functions.py", line 56, in forward
assert all(i.device.type != 'cpu' for i in inputs), (
AssertionError: Gather function not implemented for CPU tensors
Well, I got same error like this when I didn't set GPU ids.
After I got this error, I setted value of CUDA_VISIBLE_DEVICES
to single GPU id (0 or 1) and it works, but of course it only uses single GPU that matches on id what I set.
I want to know is : Is it possible to train using 2 or more GPUs? Or how can I solve this problem? I saw #20 but It was not a sufficient answer.
The version of PyTorch what I use(maybe it will need for solution) is 1.10.2 using CUDA 11.3
Thanks.
prinsusinghal commented
Facing the same issue. Is this error solved?