only sudo works for rocm-commands in rocm-docker
Closed this issue · 6 comments
On bare metal all works ok with the latest ROCm 1.7.1, but within docker we now have something unexpected: rocm only works when using sudo. So with normal rights using a clean docker image:
docker run -it --device="/dev/kfd" --device="/dev/dri/renderD128" --device="/dev/dri/renderD129" rocm/rocm-terminal
(...)
rocm-user@2888cb18b1a3:~$ rocminfo
hsa api call failure at line 900, file: /rocmdata/jedwards/git/compute/rocrinfo/rocminfo.cc. Call returned 4104
Using sudo we get perfect response for clinfo and rocminfo:
rocm-user@2888cb18b1a3:~$ sudo /opt/rocm/bin/rocminfo
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
(...)
What could be the cause?
Hi @VincentSC , thanks for the report.
The issue is actually caused by a defect of docker itself, please refer to this docker issue
The fix in docker has been merged to their master branch.
I've tried this docker nightly build and it works fine for rocm/rocm-terminal docker image with rocm-user.
I use Docker version 18.02.0-ce, build fc4de44
and this still happens to me.
I use rocm 1.7.1 with ubuntu 16.04 host with AMD Vega 56.
docker pull rocm/rocm-terminal:1.7.1
then docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/rocm-terminal:1.7.1
and I can't execute ./vector_copy
from /opt/rocm/hsa/sample/
without sudo. Same for rocminfo
.
@Sumenia, as mentioned in the above message, the issue is from a docker defect in 18.02.0 build.
Please manually install the following docker version and try again:
https://download.docker.com/linux/ubuntu/dists/xenial/pool/nightly/amd64/docker-ce_18.04.0~ce~dev~git20180315.170650.0.8fabfd2-0~ubuntu_amd64.deb
Oh! My bad @sunway513 , I did see your message, did check my version (17.~ something), did read docker documentation in order to get the latest build from apt-get, but didn’t check the version I received from it. I saw “18.~something” and though it was ok. I should have doubled checked, sorry about that and thank your for you answer 👍
In order to use GPU without sudo in rocm-docker, the regular user inside the container must have the same uid/gid as a host system user permitted for GPU access. Here is an edited repository which fixes the problem: https://github.com/dmikushin/ROCm-docker You can also pull it from DockerHub: https://hub.docker.com/repository/docker/marcusmae/rocm-docker