Detection of ROCM vs CUDA device on Clariden
edopao opened this issue · 4 comments
edopao commented
I encountered an issue when running GT4Py with gtfn_gpu
backend on Clariden. I run on a GPU node with Nvidia A100, but this code will select the ROCM cupy device:
CUPY_DEVICE: Final[Literal[None, core_defs.DeviceType.CUDA, core_defs.DeviceType.ROCM]] = (
None
if not cp
else (core_defs.DeviceType.ROCM if cp.cuda.get_hipcc_path() else core_defs.DeviceType.CUDA)
)
I suspect that CUDA was installed on Clariden with support for both Nvidia and AMD GPUs, depending on the type of node allocated by Slurm.
You can run this test:
pytest -s -v -k gtfn_gpu tests/next_tests/integration_tests/multi_feature_tests/ffront_tests/test_icon_like_scan.py::test_solve_nonhydro_stencil_52_like
It will produce this output:
if self.device_type == core_defs.DeviceType.ROCM:
# until we can rely on dlpack
> ndarray.__hip_array_interface__ = { # type: ignore[attr-defined]
"shape": ndarray.shape, # type: ignore[union-attr]
"typestr": ndarray.dtype.descr[0][1], # type: ignore[union-attr]
"descr": ndarray.dtype.descr, # type: ignore[union-attr]
"stream": 1,
"version": 3,
"strides": ndarray.strides, # type: ignore[union-attr, attr-defined]
"data": (ndarray.data.ptr, False), # type: ignore[union-attr, attr-defined]
}
E AttributeError: 'ndarray' object has no attribute '__hip_array_interface__'
src/gt4py/storage/allocators.py:270: AttributeError
================================================================= short test summary info ==================================================================
ERROR tests/next_tests/integration_tests/multi_feature_tests/ffront_tests/test_icon_like_scan.py::test_solve_nonhydro_stencil_52_like[gtfn.run_gtfn_gpu] - AttributeError: 'ndarray' object has no attribute '__hip_array_interface__'
havogt commented
Does the environment have an installation of cupy-rocm
or just cupy-cuda
? When we wrote that code, there was no clean/documented way to have cupy for both gpu types. Not sure if that changed.
edopao commented
cupy-cuda11x 13.0.0
>>> import cupy as cp
>>> cp.cuda.get_hipcc_path()
'/usr/bin/hipcc'
havogt commented
Maybe we should use this variable cp.cuda.runtime.is_hip
edopao commented
Yes, that seems to work!