amdegroot/ssd.pytorch

AssertionError: Gather function not implemented for CPU tensors

whansk50 opened this issue · 1 comments

Hello,

I executed train.py with multiple GPUs (I setted with CUDA_VISIBLE_DEVICES=0,1 because I got double GPUs), and I got error :

Traceback (most recent call last):
  File "train.py", line 275, in <module>
    train()
  File "train.py", line 191, in train
    out = net(images)
  File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
    return self.gather(outputs, self.output_device)
  File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 181, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 78, in gather
    res = gather_map(outputs)
  File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 73, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return Gather.apply(target_device, dim, *outputs)
  File "/home/***/miniconda3/envs/gpu38/lib/python3.8/site-packages/torch/nn/parallel/_functions.py", line 56, in forward
    assert all(i.device.type != 'cpu' for i in inputs), (
AssertionError: Gather function not implemented for CPU tensors

Well, I got same error like this when I didn't set GPU ids.

After I got this error, I setted value of CUDA_VISIBLE_DEVICES to single GPU id (0 or 1) and it works, but of course it only uses single GPU that matches on id what I set.

I want to know is : Is it possible to train using 2 or more GPUs? Or how can I solve this problem? I saw #20 but It was not a sufficient answer.

The version of PyTorch what I use(maybe it will need for solution) is 1.10.2 using CUDA 11.3

Thanks.

Facing the same issue. Is this error solved?