RuntimeError: HIP error when running ResNet-50 on PRO W7900 with PyTorch

Question

RuntimeError: HIP error when running ResNet-50 on PRO W7900 with PyTorch

liangyong928 opened this issue 7 months ago · 1 comments

The following code runs normally on an AMD PRO W7900 GPU:

import torch
device = torch.device("cuda")
x = torch.randn(128,10,224,224).to(device)
model = torch.nn.Conv2d(10, 64, 5).to(device)
output = model(x)
print(output.device)

However, when running the code below, I encounter an error:

import torch
import torchvision.models as models
from torchvision.models import ResNet50_Weights
x_large = torch.randn(128, 3, 224, 224)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
#device = torch.device("cpu")
weights = ResNet50_Weights.IMAGENET1K_V1
model = models.resnet50(weights=weights).to(device)
model.eval()
x_large = x_large.to(device)
output = model(x_large)
print(output.device)

The error message is as follows:

Traceback (most recent call last):
  File "/root/test/testnew2.py", line 11, in <module>
    output = model(x_large)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torchvision/models/resnet.py", line 285, in forward
    return self._forward_impl(x)
  File "/usr/local/lib/python3.10/dist-packages/torchvision/models/resnet.py", line 278, in _forward_impl
    x = self.avgpool(x)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/pooling.py", line 1194, in forward
    return F.adaptive_avg_pool2d(input, self.output_size)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 1228, in adaptive_avg_pool2d
    return torch._C._nn.adaptive_avg_pool2d(input, _output_size)
RuntimeError: HIP error: the operation cannot be performed in the present state
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

Why does the first block of code run without any issues while the second block throws an error when using the AMD PRO W7900 GPU for computation? I would appreciate any insights or suggestions for resolving this issue.

Answer 1 · 2024-04-26T05:46:11.000Z

ROCm/pytorch#1398 (comment)