ROCm/ROCm-docker

Error: couldn't find any HIP devices

ztz0223 opened this issue · 1 comments

Hi all,

I have 2 W6800 running on RHEL 8, I tested the gpu burn from:
https://github.com/ROCm-Developer-Tools/HIP-Examples/tree/master/gpu-burn

when I tested the binary: gpuburn-hip on physical machine, it works:

[xxx@dock2 build]$ ./gpuburn-hip -t 200
Total no. of GPUs found: 2
Init Burn Thread for device (0)
Init Burn Thread for device (1)
Burn Thread using device (0)
Burn Thread using device (1)
Temps: [GPU0: 32 C] [GPU1: 34 C] 200s
Temps: [GPU0: 37 C] [GPU1: 35 C] 199s
Temps: [GPU0: 37 C] [GPU1: 36 C] 198s
Temps: [GPU0: 37 C] [GPU1: 36 C] 197s
Temps: [GPU0: 38 C] [GPU1: 36 C] 196s
Temps: [GPU0: 38 C] [GPU1: 37 C] 195s
Temps: [GPU0: 39 C] [GPU1: 38 C] 194s
Temps: [GPU0: 39 C] [GPU1: 38 C] 193s

but I tested in the rocm/rocm-terminal container, I just got the error:


rocm-user@f90db1115a3b:/gpu-burn/build$ rocm-smi


========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
GPU  Temp (DieEdge)  AvgPwr  SCLK  MCLK   Fan    Perf  PwrCap  VRAM%  GPU%
0    28.0c           9.0W    0Mhz  96Mhz  20.0%  auto  213.0W    0%   0%
====================================================================================
=============================== End of ROCm SMI Log ================================
rocm-user@f90db1115a3b:/gpu-burn/build$

Then run the binary
rocm-user@f90db1115a3b:/gpu-burn/build$ ./gpuburn-hip
Error: couldn't find any HIP devices
rocm-user@f90db1115a3b:/gpu-burn/build$

Error: couldn't find any HIP devices

Any ideas? W6800 cannot recognized by the container?

Thanks.

I know, the user account in the container must has the groups: video and render added