CUDA error: an illegal memory access was encountered

Question

CUDA error: an illegal memory access was encountered

Christopher-RH opened this issue 3 years ago · 0 comments

Traceback (most recent call last):
File "tools/train.py", line 235, in
confmap_preds = rodnet(data.float().cuda())
File "/share/home/3120305377/.conda/envs/rodenet2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/share/home/3120305377/PythonProjects/RODNet/rodnet/models/rodnet_hgwi_v2.py", line 35, in forward
out = self.stacked_hourglass(x)
File "/share/home/3120305377/.conda/envs/rodenet2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/share/home/3120305377/PythonProjects/RODNet/rodnet/models/backbones/hgwi.py", line 46, in forward
x, x1, x2, x3 = self.hourglass[i]0
File "/share/home/3120305377/.conda/envs/rodenet2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/share/home/3120305377/PythonProjects/RODNet/rodnet/models/backbones/hgwi.py", line 130, in forward
x1 = self.relu(self.skip_bn1(self.skip_inception1(x)))
File "/share/home/3120305377/.conda/envs/rodenet2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/share/home/3120305377/PythonProjects/RODNet/rodnet/models/backbones/hgwi.py", line 89, in forward
branch2 = self.branch2b(branch2)
File "/share/home/3120305377/.conda/envs/rodenet2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/share/home/3120305377/.conda/envs/rodenet2/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 479, in forward
return F.conv3d(input, self.weight, self.bias, self.stride,
RuntimeError: CUDA error: an illegal memory access was encountered

1、I have encountered this Error on A100 40GB. After google, Someone said that it might be the dataset and model were placed on the CPU and GPU, respectively. So, What exactly is the problem?
2、Or, would you like to provide trained models for testing?
Thanks a lot.