POSTECH-CVLab/point-transformer

Pointops - GPU DataParallel Error

L-Reichardt opened this issue · 1 comments

Hello I implemented the model into another training loop and it trains fine on a single GPU. However when I use multi-GPU DataParallel the model stops with the following error
ATen/native/cuda/IndexKernel.cu:91: index out of bounds

According to the error message, this is caused by pointops queryandgroup.
Any suggestions what might cause that?

Error Message:

/opt/conda/conda-bld/pytorch_1656352464346/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [30669,0,0], thread: [31,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

Traceback ...

RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/Paper/model/pointtransformer_seg.py", line 162, in forward
    p1, x1, o1 = self.enc1([p0, x0, o0])
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/Paper/model/pointtransformer_seg.py", line 116, in forward
    x = self.relu(self.bn2(self.transformer2([p, x, o])))
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/Paper/model/pointtransformer_seg.py", line 26, in forward
    x_k = pointops.queryandgroup(self.nsample, p, p, x_k, None, o, o, use_xyz=True)  # (n, nsample, 3+c)
  File "/home/Paper/lib/pointops/functions/pointops.py", line 91, in queryandgroup
    grouped_xyz = xyz[idx.view(-1).long(), :].view(m, nsample, 3) # (m, nsample, 3)
RuntimeError: CUDA error: device-side assert triggered

Update : It works with DistributedDataParallel.
Since PyTorch officially recommends always using DistributedDataParallel over DataParallel, this issue is most likely a PyTorch DataParallel issue.