RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:20
HannahJHan opened this issue · 2 comments
HannahJHan commented
Hello!
I met some questions when I run the script. It seems like some issues about cuda, but I can't
fix it by lots of google methods. Like adding CUDA_LAUNCH_BLOCKING=1 before the command, or using smaller batchsize.
I wonder if anyone have run into this trouble. Thanks a lot!
python train.py
loading raw data...
training set: ((16733, 4096, 9), (16733, 4096))
testing set: ((2239, 4096, 9), (2239, 4096))
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCTensorCopy.c line=20 error=77 : an illegal memory access was encountered
Traceback (most recent call last):
File "train.py", line 209, in <module>
output = model(input_var, x_indices_var, y_indices_var, z_indices_var, hidden_list)
File "/home/han/.conda/envs/py27_torch030/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "RSNet_py27/net.py", line 91, in forward
x_pooled = self.pool_x( conv_3, x_slice_idx ) # num_batch, 64, numSlices, 1
File "/home/han/.conda/envs/py27_torch030/lib/python2.7/site-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(*input, **kwargs)
File "RSNet_py27/layers/slice_pool_layer/slice_pool_layer.py", line 101, in forward
return self.sp(input, slice_idx_mat, self.pool_type, slice_counts)
File "RSNet_py27/layers/slice_pool_layer/slice_pool_layer.py", line 41, in forward
out = out.cuda()
File "/home/han/.conda/envs/py27_torch030/lib/python2.7/site-packages/torch/_utils.py", line 69, in _cuda
return new_type(self.size()).copy_(self, async)
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:20
HannahJHan commented
I have settled the issue just by changing another GPU. Tired heart...
qq456cvb commented
I also encountered this issue with no luck of changing gpus, it seems there is some bug in compiled cude code (slide_pooling_layer).