qianguih/RSNet

RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:20

HannahJHan opened this issue · 2 comments

Hello!

I met some questions when I run the script. It seems like some issues about cuda, but I can't
fix it by lots of google methods. Like adding CUDA_LAUNCH_BLOCKING=1 before the command, or using smaller batchsize.

I wonder if anyone have run into this trouble. Thanks a lot!

python train.py 
loading raw data...
training set:  ((16733, 4096, 9), (16733, 4096))
testing set:  ((2239, 4096, 9), (2239, 4096))
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCTensorCopy.c line=20 error=77 : an illegal memory access was encountered
Traceback (most recent call last):
  File "train.py", line 209, in <module>
    output = model(input_var, x_indices_var, y_indices_var, z_indices_var, hidden_list)
  File "/home/han/.conda/envs/py27_torch030/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "RSNet_py27/net.py", line 91, in forward
    x_pooled = self.pool_x( conv_3, x_slice_idx )  # num_batch, 64, numSlices, 1
  File "/home/han/.conda/envs/py27_torch030/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "RSNet_py27/layers/slice_pool_layer/slice_pool_layer.py", line 101, in forward
    return self.sp(input, slice_idx_mat, self.pool_type, slice_counts)
  File "RSNet_py27/layers/slice_pool_layer/slice_pool_layer.py", line 41, in forward
    out = out.cuda()
  File "/home/han/.conda/envs/py27_torch030/lib/python2.7/site-packages/torch/_utils.py", line 69, in _cuda
    return new_type(self.size()).copy_(self, async)
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:20

I have settled the issue just by changing another GPU. Tired heart...

I also encountered this issue with no luck of changing gpus, it seems there is some bug in compiled cude code (slide_pooling_layer).