RuntimeError: HIP error when running ResNet-50 on PRO W7900 with PyTorch
liangyong928 opened this issue · 1 comments
liangyong928 commented
The following code runs normally on an AMD PRO W7900 GPU:
import torch
device = torch.device("cuda")
x = torch.randn(128,10,224,224).to(device)
model = torch.nn.Conv2d(10, 64, 5).to(device)
output = model(x)
print(output.device)
However, when running the code below, I encounter an error:
import torch
import torchvision.models as models
from torchvision.models import ResNet50_Weights
x_large = torch.randn(128, 3, 224, 224)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#device = torch.device("cpu")
weights = ResNet50_Weights.IMAGENET1K_V1
model = models.resnet50(weights=weights).to(device)
model.eval()
x_large = x_large.to(device)
output = model(x_large)
print(output.device)
The error message is as follows:
Traceback (most recent call last):
File "/root/test/testnew2.py", line 11, in <module>
output = model(x_large)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torchvision/models/resnet.py", line 285, in forward
return self._forward_impl(x)
File "/usr/local/lib/python3.10/dist-packages/torchvision/models/resnet.py", line 278, in _forward_impl
x = self.avgpool(x)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/pooling.py", line 1194, in forward
return F.adaptive_avg_pool2d(input, self.output_size)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 1228, in adaptive_avg_pool2d
return torch._C._nn.adaptive_avg_pool2d(input, _output_size)
RuntimeError: HIP error: the operation cannot be performed in the present state
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
Why does the first block of code run without any issues while the second block throws an error when using the AMD PRO W7900 GPU for computation? I would appreciate any insights or suggestions for resolving this issue.
liangyong928 commented