d-li14/involution

cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

beiliu253 opened this issue · 2 comments

@d-li14 Hi,

I am using involution_cuda.py to replace convolution with involution module you provide in this repo. The training process is totally fine. However, I will encounter this error when doing evaluation. I have no idea about what causes this error and how to solve it.

Traceback (most recent call last):
File "extract_emb.py", line 100, in
main()
File "extract_emb.py", line 96, in main
store_emb(model, args)
File "extract_emb.py", line 30, in store_emb
output = model(data)
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "models/rednet.py", line 126, in forward
out = self.layer3(out)
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/
nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "models/rednet.py", line 58, in forward
out = F.relu(self.bn2(self.conv2(out)))
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/
nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "models/involution_cuda.py", line 278, in fo
rward
out = _involution_cuda(x, weight, stride=self.stride, padding=(self.kernel_size - 1) // 2)
File "models/involution_cuda.py", line 235, in _i
nvolution_cuda
out = _involution.apply(input, weight, _pair(stride), _pair(padding), _pair(dilation))
File "models/involution_cuda.py", line 167, in fo
rward
pad_h=padding[0], pad_w=padding[1])
File "cupy/_util.pyx", line 59, in cupy._util.memoize.decorator.ret
File "models/involution_cuda.py", line 27, in loa
d_kernel
kernel_code = cupy.cuda.compile_with_cache(code)
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/cupy/c
uda/compiler.py", line 376, in compile_with_cache
cache_in_memory)
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/cupy/c
uda/compiler.py", line 431, in _compile_with_cache_cuda
mod.load(cubin)
File "cupy/cuda/function.pyx", line 222, in cupy.cuda.function.Module.load
File "cupy/cuda/function.pyx", line 224, in cupy.cuda.function.Module.load
File "cupy_backends/cuda/api/driver.pyx", line 246, in cupy_backends.cuda.api.driver.moduleLoadData
File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy_backends/cuda/api/driver.pyx", line 253, in cupy_backends.cuda.api.driver.moduleUnload
File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.dealloc'
Traceback (most recent call last):
File "cupy_backends/cuda/api/driver.pyx", line 253, in cupy_backends.cuda.api.driver.moduleUnload
File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

Is it possible to evaluate the provided checkpoints in this repo? I have tried on them and no such error is found.

Hi @beiliu253 , are you using mixed/half/16 bit precision? I had this issue and I switched to full precision (32 bit) and it worked.