RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR in PyTorch Training
Closed this issue · 2 comments
ENDlezZenith commented
Environment:
Windows 10 64-bit 22H2
ZLUDA v3.8
AMD Radeon Pro VII (Vega20) (gfx906)
ROCm and ZLUDA System Environment Variables declared properly
cublas, cusparse, nvrtc lib substituted properly
Description
I was taking AI classes and when I tried to use my local runtime with AMD GPU, there is a error. I've tested this on Colab, it just worked well, might not be code problems. Here is the ipynb code file if you need for debug. It's a small 250 line in class model.
Waldo.zip
Error Log
------------------------------
Training process started
------------------------------
0%| | 0/246 [00:43<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[15], line 14
12 label = data[1].to(device)
13 label = label.to(torch.float32)
---> 14 pred = mynet(img)
15 pred = torch.squeeze(pred)
16 pred.reshape(-1)
File D:\Artificial Intelligence\Runtime\venv\lib\site-packages\torch\nn\modules\module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)
File D:\Artificial Intelligence\Runtime\venv\lib\site-packages\torch\nn\modules\module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None
File D:\Artificial Intelligence\Runtime\venv\lib\site-packages\torch\nn\modules\container.py:217, in Sequential.forward(self, input)
215 def forward(self, input):
216 for module in self:
--> 217 input = module(input)
218 return input
File D:\Artificial Intelligence\Runtime\venv\lib\site-packages\torch\nn\modules\module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)
File D:\Artificial Intelligence\Runtime\venv\lib\site-packages\torch\nn\modules\module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None
File D:\Artificial Intelligence\Runtime\venv\lib\site-packages\torch\nn\modules\conv.py:460, in Conv2d.forward(self, input)
459 def forward(self, input: Tensor) -> Tensor:
--> 460 return self._conv_forward(input, self.weight, self.bias)
File D:\Artificial Intelligence\Runtime\venv\lib\site-packages\torch\nn\modules\conv.py:456, in Conv2d._conv_forward(self, input, weight, bias)
452 if self.padding_mode != 'zeros':
453 return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
454 weight, bias, self.stride,
455 _pair(0), self.dilation, self.groups)
--> 456 return F.conv2d(input, weight, bias, self.stride,
457 self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
lshqqytiger commented
Disable cuDNN:
torch.backends.cudnn.enabled = False
ENDlezZenith commented
Thanks, it fixed.