NVIDIA/nvidia-container-runtime

Running nvidia-container-runtime with podman is blowing up.

rhatdan opened this issue ยท 90 comments

  1. Issue or feature description
    rootless and rootful podman does not work with the nvidia plugin

  2. Steps to reproduce the issue
    Install the nvidia plugin, configure it to run with podman
    execute the podman command and check if the devices is configured correctly.

  3. Information to attach (optional if deemed irrelevant)

    Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
    Kernel version from uname -a
    Fedora 30 and later
    Any relevant kernel output lines from dmesg
    Driver information from nvidia-smi -a
    Docker version from docker version
    NVIDIA packages version from dpkg -l 'nvidia' or rpm -qa 'nvidia'
    NVIDIA container library version from nvidia-container-cli -V
    NVIDIA container library logs (see troubleshooting)
    Docker command, image and tag used

I am reporting this based on other users complaining. This is what they said.

We discovered that the ubuntu 18.04 machine needed a configuration change to get rootless working with nvidia:
"no-cgroups = true" was set in /etc/nvidia-container-runtime/config.toml
Unfortunately this config change did not work on Centos 7, but it did change the rootless error to:
nvidia-container-cli: initialization error: cuda error: unknown error\\n\"""

This config change breaks podman running from root, with the error:
Failed to initialize NVML: Unknown Error

Interestingly, root on ubuntu gets the same error even though rootless works.

The Podman team would like to work with you guys to get this to work well in both root full and rootless containers if possible. But we need someone to work with.

Hello!

@rhatdan do you mind filling the following issue template: https://github.com/NVIDIA/nvidia-docker/blob/master/.github/ISSUE_TEMPLATE.md

Thanks!

I can work with the podman team.

@nvjmayo Thanks for the suggestions. Some good news and less good.

This works rootless:
podman run --rm --hooks-dir /usr/share/containers/oci/hooks.d nvcr.io/nvidia/cuda nvidia-smi
The same command continues to fail with the image: docker.io/nvidia/cuda

In fact rootless works with or without /usr/share/containers/oci/hooks.d/01-nvhook.json installed using the image: nvcr.io/nvidia/cuda

Running as root continues to fail when no-cgroups = true for either container, returning:
Failed to initialize NVML: Unknown Error

Strange I would not expect podman to run a hook that did not have a json file describing the hook.

@eaepstein I'm still struggling to reproduce the issue you see. Using docker.io/nvidia/cuda also works for me with the hooks dir.

$ podman run --rm --hooks-dir /usr/share/containers/oci/hooks.d/ docker.io/nvidia/cuda nvidia-smi
Tue Oct 22 21:35:44 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:65:00.0 N/A |                  N/A |
| 50%   38C    P0    N/A /  N/A |      0MiB /  2001MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

without the hook I would expect to see a failure roughly like:

Error: time="2019-10-22T14:35:14-07:00" level=error msg="container_linux.go:346: starting container process caused \"exec: \\\"nvidia-smi\\\": executable file not found in $PATH\""
container_linux.go:346: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": OCI runtime command not found error

This is because the libraries and tools get installed by the hook in order to match the host drivers. (an unfortunate limitation of tightly coupled driver+library releases)

I think there is a configuration issue and not an issue of the container image (docker.io/nvidia/cuda vs nvcr.io/nvidia/cuda).

Reviewing my earlier posts, I recommend changing my 01-nvhook.json and remove the NVIDIA_REQUIRE_CUDA=cuda>=10.1 from it. My assumption is everyone has the latest CUDA install, which was kind of a silly assumption on my part. The CUDA version doesn't have to be specified, and you can leave this environment variable out of your set up. It was an artifact of my earlier experiments.

@nvjmayo we started from scratch with a new machine (CentOS Linux release 7.7.1908) and both docker.io and nvcr.io images are working for us now too. And --hooks-dir must now be specified for both to work. Thanks for the help!

@rhatdan @nvjmayo Turns out that getting rootless podman working with nvidia on centos 7 is a bit more complicated, at least for us.

Here is our scenario on brand new centos 7.7 machine

  1. run nvidia-smi with rootless podman
    1.result: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\n\""

  2. run podman with user=root
    2.result: nvidia-smi works

  3. run podman rootless
    3.result: nvidia-smi works!

  4. reboot machine, run podman rootless
    4.result: fails again with same error as NVIDIA/nvidia-docker#1

Conclusion: running nvidia container with podman as root changes the environment for rootless to work. Environment cleared on reboot.

One other comment: podman as root and rootless podman cannot run with the same /etc/nvidia-container-runtime/config.toml - no-cgroups must =false for root and =true for rootless

If the nvidia hook is doing any privileged operations like modifying /dev and adding devicenodes, then this will not work with rootless. (In rootless all processes are running with the Users UID. Probably when you run rootfull, it is doing the privileged operations, so the next time you run rootless, those activities do not need to be done.

I would suggest for rootless systems, that the /dev and nvidia ops be done as a systemd unit file, so the system is preconfigured and then the rootless jobs will work fine.

After running nvidia/cuda with rootfull podman, the following exist:
crw-rw-rw-. 1 root root 195, 254 Oct 25 09:11 nvidia-modeset
crw-rw-rw-. 1 root root 195, 255 Oct 25 09:11 nvidiactl
crw-rw-rw-. 1 root root 195, 0 Oct 25 09:11 nvidia0
crw-rw-rw-. 1 root root 241, 1 Oct 25 09:11 nvidia-uvm-tools
crw-rw-rw-. 1 root root 241, 0 Oct 25 09:11 nvidia-uvm

None of these devices exist after boot. Running nvidia-smi rootless (no podman) creates:
crw-rw-rw-. 1 root root 195, 0 Oct 25 13:40 nvidia0
crw-rw-rw-. 1 root root 195, 255 Oct 25 13:40 nvidiactl

I created the other three entries using "sudo mknod -m 666 etc..." but that is not enough to run rootless. Something else is needed in the environment.

Running nvidia/cuda with rootfull podman at boot would work, but not pretty.

Thanks for the suggestion

flx42 commented

This behavior is documented in our installation guide:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-verifications

From a userns you can't mknod or use nvidia-modprobe. But, if this binary is present and if it can be called in a context where setuid works, it's an option.

There is already nvidia-persistenced as a systemd unit file, but it won't load the nvidia_uvm kernel modules nor create the device files, IIRC.

Another option is to use udev rules, which is what Ubuntu is doing:

$ cat /lib/udev/rules.d/71-nvidia.rules 
[...]

# Load and unload nvidia-uvm module
ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe nvidia-uvm"
ACTION=="remove", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe -r nvidia-uvm"

# This will create the device nvidia device nodes
ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/usr/bin/nvidia-smi"

# Create the device node for the nvidia-uvm module
ACTION=="add", DEVPATH=="/module/nvidia_uvm", SUBSYSTEM=="module", RUN+="/sbin/create-uvm-dev-node"

Udev rules makes sense to me.

@flx42
sudo'ing the setup script in "4.5. Device Node Verification" is the only thing needed to get rootless nvidia/cuda containers running for us. It created the following devices:
crw-rw-rw-.  1 root root    195,   0 Oct 27 20:38 nvidia0
crw-rw-rw-.  1 root root    195, 255 Oct 27 20:38 nvidiactl
crw-rw-rw-.  1 root root    241,   0 Oct 27 20:38 nvidia-uvm

The udev file only created the first two and was not sufficient by itself.
We'll go with a unit file for the setup script.

Many thanks for your help.

qhaas commented

Thanks guys, with insight from this issue and others, I was able to get podman working with my Quadro in EL7 using sudo podman run --privileged --rm --hooks-dir /usr/share/containers/oci/hooks.d docker.io/nvidia/cudagl:10.1-runtime-centos7 nvidia-smi after installing the 'nvidia-container-toolkit' package.

Once the dust settles on how to get GPU support in rootless podman in EL7, a step-by-step guide would make for a great blog post and/or entry into the podman and/or nvidia documentation.

Hello @nvjmayo and @rhatdan. I'm wondering if there is an update on this issue or this one for how to access NVIDIA GPU's from containers run rootless with podman.

On RHEL8.1, with default /etc/nvidia-container-runtime/config.toml, and running containers with root, GPU access works as expected. Rootless does not work by default, it fails with cgroup related errors (as expected).

After modifying the config.toml file -- setting no-cgroups = true and changing the debug log file -- rootless works. However, these changes make GPU access fail in containers run as root, with error "Failed to initialize NVML: Unknown Error."

Please let me know if there is any recent documentation on how to do this beyond these two issues.

Steps to get it working on RHEL 8.1:

  1. Install Nvidia Drivers, make sure nvidia-smi works on the host
  2. Install nvidia-container-toolkit from repos at
baseurl=https://nvidia.github.io/libnvidia-container/centos7/$basearch
baseurl=https://nvidia.github.io/nvidia-container-runtime/centos7/$basearch
  1. Modify /etc/nvidia-container-runtime/config.toml and change these values:
[nvidia-container-cli]
#no-cgroups = false
no-cgroups = true
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
debug = "~/./local/nvidia-container-runtime.log"
  1. run it rootless as podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ nvidia/cuda:10.2-devel-ubi8 /usr/bin/nvidia-smi

Thanks @jamescassell.

I repeated those steps on RHEL8.1, and nvidia-smi works as expected when running rootless. However, once those changes are made, I am unable to run nvidia-smi in a container run as root. Is this behaviour expected, or is there some change in CLI flags needed when running as root? Running as root did work before making these changes.

Is there a way to configure a system so that we can utilize GPUs with podman as root and non-root user?

I can't run podman rootless with GPU, someone can help me?

docker run --runtime=nvidia --privileged nvidia/cuda nvidia-smi works fine but
podman run --runtime=nvidia --privileged nvidia/cuda nvidia-smi crashes, same for
sudo podman run --runtime=nvidia --privileged nvidia/cuda nvidia-smi

Output:

$ podman run --runtime=nvidia --privileged nvidia/cuda nvidia-smi
2020/04/03 13:34:52 ERROR: /usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
Error: `/usr/bin/nvidia-container-runtime start e3ccb660bf27ce0858ee56476e58b53cd3dc900e8de80f08d10f3f844c0e9f9a` failed: exit status 1

But, runc exists:

$ whereis runc
runc: /usr/bin/runc
$ whereis docker-runc
docker-runc:
$ podman --version
podman version 1.8.2
$ cat ~/.config/containers/libpod.conf
# libpod.conf is the default configuration file for all tools using libpod to
# manage containers

# Default transport method for pulling and pushing for images
image_default_transport = "docker://"

# Paths to look for the conmon container manager binary.
# If the paths are empty or no valid path was found, then the `$PATH`
# environment variable will be used as the fallback.
conmon_path = [
            "/usr/libexec/podman/conmon",
            "/usr/local/libexec/podman/conmon",
            "/usr/local/lib/podman/conmon",
            "/usr/bin/conmon",
            "/usr/sbin/conmon",
            "/usr/local/bin/conmon",
            "/usr/local/sbin/conmon",
            "/run/current-system/sw/bin/conmon",
]

# Environment variables to pass into conmon
conmon_env_vars = [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
]

# CGroup Manager - valid values are "systemd" and "cgroupfs"
#cgroup_manager = "systemd"

# Container init binary
#init_path = "/usr/libexec/podman/catatonit"

# Directory for persistent libpod files (database, etc)
# By default, this will be configured relative to where containers/storage
# stores containers
# Uncomment to change location from this default
#static_dir = "/var/lib/containers/storage/libpod"

# Directory for temporary files. Must be tmpfs (wiped after reboot)
#tmp_dir = "/var/run/libpod"
tmp_dir = "/run/user/1000/libpod/tmp"

# Maximum size of log files (in bytes)
# -1 is unlimited
max_log_size = -1

# Whether to use chroot instead of pivot_root in the runtime
no_pivot_root = false

# Directory containing CNI plugin configuration files
cni_config_dir = "/etc/cni/net.d/"

# Directories where the CNI plugin binaries may be located
cni_plugin_dir = [
               "/usr/libexec/cni",
               "/usr/lib/cni",
               "/usr/local/lib/cni",
               "/opt/cni/bin"
]

# Default CNI network for libpod.
# If multiple CNI network configs are present, libpod will use the network with
# the name given here for containers unless explicitly overridden.
# The default here is set to the name we set in the
# 87-podman-bridge.conflist included in the repository.
# Not setting this, or setting it to the empty string, will use normal CNI
# precedence rules for selecting between multiple networks.
cni_default_network = "podman"

# Default libpod namespace
# If libpod is joined to a namespace, it will see only containers and pods
# that were created in the same namespace, and will create new containers and
# pods in that namespace.
# The default namespace is "", which corresponds to no namespace. When no
# namespace is set, all containers and pods are visible.
#namespace = ""

# Default infra (pause) image name for pod infra containers
infra_image = "k8s.gcr.io/pause:3.1"

# Default command to run the infra container
infra_command = "/pause"

# Determines whether libpod will reserve ports on the host when they are
# forwarded to containers. When enabled, when ports are forwarded to containers,
# they are held open by conmon as long as the container is running, ensuring that
# they cannot be reused by other programs on the host. However, this can cause
# significant memory usage if a container has many ports forwarded to it.
# Disabling this can save memory.
#enable_port_reservation = true

# Default libpod support for container labeling
# label=true

# The locking mechanism to use
lock_type = "shm"

# Number of locks available for containers and pods.
# If this is changed, a lock renumber must be performed (e.g. with the
# 'podman system renumber' command).
num_locks = 2048

# Directory for libpod named volumes.
# By default, this will be configured relative to where containers/storage
# stores containers.
# Uncomment to change location from this default.
#volume_path = "/var/lib/containers/storage/volumes"

# Selects which logging mechanism to use for Podman events.  Valid values
# are `journald` or `file`.
# events_logger = "journald"

# Specify the keys sequence used to detach a container.
# Format is a single character [a-Z] or a comma separated sequence of
# `ctrl-<value>`, where `<value>` is one of:
# `a-z`, `@`, `^`, `[`, `\`, `]`, `^` or `_`
#
# detach_keys = "ctrl-p,ctrl-q"

# Default OCI runtime
runtime = "runc"

# List of the OCI runtimes that support --format=json.  When json is supported
# libpod will use it for reporting nicer errors.
runtime_supports_json = ["crun", "runc"]

# List of all the OCI runtimes that support --cgroup-manager=disable to disable
# creation of CGroups for containers.
runtime_supports_nocgroups = ["crun"]

# Paths to look for a valid OCI runtime (runc, runv, etc)
# If the paths are empty or no valid path was found, then the `$PATH`
# environment variable will be used as the fallback.
[runtimes]
runc = [
            "/usr/bin/runc",
            "/usr/sbin/runc",
            "/usr/local/bin/runc",
            "/usr/local/sbin/runc",
            "/sbin/runc",
            "/bin/runc",
            "/usr/lib/cri-o-runc/sbin/runc",
            "/run/current-system/sw/bin/runc",
]

crun = [
                "/usr/bin/crun",
                "/usr/sbin/crun",
                "/usr/local/bin/crun",
                "/usr/local/sbin/crun",
                "/sbin/crun",
                "/bin/crun",
                "/run/current-system/sw/bin/crun",
]

nvidia = ["/usr/bin/nvidia-container-runtime"]

# Kata Containers is an OCI runtime, where containers are run inside lightweight
# Virtual Machines (VMs). Kata provides additional isolation towards the host,
# minimizing the host attack surface and mitigating the consequences of
# containers breakout.
# Please notes that Kata does not support rootless podman yet, but we can leave
# the paths below blank to let them be discovered by the $PATH environment
# variable.

# Kata Containers with the default configured VMM
kata-runtime = [
    "/usr/bin/kata-runtime",
]

# Kata Containers with the QEMU VMM
kata-qemu = [
    "/usr/bin/kata-qemu",
]

# Kata Containers with the Firecracker VMM
kata-fc = [
    "/usr/bin/kata-fc",
]

# The [runtimes] table MUST be the last thing in this file.
# (Unless another table is added)
# TOML does not provide a way to end a table other than a further table being
# defined, so every key hereafter will be part of [runtimes] and not the main
# config.
$ cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
debug = "/tmp/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
debug = "/tmp/nvidia-container-runtime.log
$ cat /tmp/nvidia-container-runtime.log
2020/04/03 13:23:02 Running /usr/bin/nvidia-container-runtime
2020/04/03 13:23:02 Using bundle file: /home/andrews/.local/share/containers/storage/vfs-containers/614cb26f8f4719e3aba56be2e1a6dc29cd91ae760d9fe3bf83d6d1b24becc638/userdata/config.json
2020/04/03 13:23:02 prestart hook path: /usr/bin/nvidia-container-runtime-hook
2020/04/03 13:23:02 Prestart hook added, executing runc
2020/04/03 13:23:02 Looking for "docker-runc" binary
2020/04/03 13:23:02 "docker-runc" binary not found
2020/04/03 13:23:02 Looking for "runc" binary
2020/04/03 13:23:02 Runc path: /usr/bin/runc
2020/04/03 13:23:09 Running /usr/bin/nvidia-container-runtime
2020/04/03 13:23:09 Command is not "create", executing runc doing nothing
2020/04/03 13:23:09 Looking for "docker-runc" binary
2020/04/03 13:23:09 "docker-runc" binary not found
2020/04/03 13:23:09 Looking for "runc" binary
2020/04/03 13:23:09 ERROR: find runc path: exec: "runc": executable file not found in $PATH
2020/04/03 13:31:06 Running nvidia-container-runtime
2020/04/03 13:31:06 Command is not "create", executing runc doing nothing
2020/04/03 13:31:06 Looking for "docker-runc" binary
2020/04/03 13:31:06 "docker-runc" binary not found
2020/04/03 13:31:06 Looking for "runc" binary
2020/04/03 13:31:06 Runc path: /usr/bin/runc
$ nvidia-container-runtime --version
runc version 1.0.0-rc8
commit: 425e105d5a03fabd737a126ad93d62a9eeede87f
spec: 1.0.1-dev
NVRM version:   440.64.00
CUDA version:   10.2

Device Index:   0
Device Minor:   0
Model:          GeForce RTX 2070
Brand:          GeForce
GPU UUID:       GPU-22dfd02e-a668-a6a6-a90a-39d6efe475ee
Bus Location:   00000000:01:00.0
Architecture:   7.5
$ docker version
Client:
 Version:           18.09.7
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        2d0083d
 Built:             Thu Jun 27 17:56:23 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b7f0
  Built:            Wed Mar 11 01:24:19 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

See particularly step 4. #85 (comment)

This looks like the nvidia plugin is searching for a hard coded path to runc?

[updated] Hi @jamescassell , unfortunately do not work for me.
(same error using sudo)

$ podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ --runtime=nvidia nvidia/cudanvidia-smi
2020/04/03 17:33:06 ERROR: /usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
2020/04/03 17:33:06 ERROR: /usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
Error: `/usr/bin/nvidia-container-runtime start 060398d97299ee033e8ebd698a11c128bd80ce641dd389976ca43a34b26abab3` failed: exit status 1

Hi @jamescassell , unfortunately do not work for me.

$ podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ nvidia/cuda nvidia-smi
Error: container_linux.go:345: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": OCI runtime command not found error

Did you make the other changes described? I'd hit the same error until making the config changes.

Not sure if it's relevant but looks like you're missing a quote: debug = "/tmp/nvidia-container-runtime.log

@jamescassell
$ sudo nano /etc/nvidia-container-runtime/config.toml

I think this is a podman issue. Podman is not passing $PATH down to conmon when it executes it.
containers/podman#5712
I am not sure if conmon then passes the PATH environment down to the OCI runtime either.

@rhatdan yes , I will check this PR containers/podman#5712
Thanks

I had a major issue with this error message popping up when trying to change my container user id while adding the hook that was made to fix the rootless problem.

Error: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\\\n\\\"\"": OCI runtime error
But I've since learned that this particular behavior is quite quirky and where I thought I pinpointed it, it now seems to work, if there is a call to the container using sudo (the container wouldn't work but the subsequent command did). Eagerly awaiting an update where root (no pun intended) of this nvidia container problem gets addressed.

Hi @rhatdan , answering your previous question containers/podman#5712 (comment)
I was able to install the new version of podman, and it works fine with my GPU, however, I am getting this strange behavior at the end of the execution, please see:

andrews@deeplearning:~/Projects$ podman run -it --rm --runtime=nvidia --privileged nvidia/cuda:10.0-cudnn7-runtime nvidia-smi 
Mon May 18 21:30:17 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:01:00.0  On |                  N/A |
| 37%   30C    P8     9W / 175W |    166MiB /  7979MiB |      5%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+
2020/05/18 23:30:18 ERROR: /usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
ERRO[0003] Error removing container 672a332467da4e91d8ac2fdc8f3c2973a808321341c2d80caa8d0ecad4f0db65: error removing container 672a332467da4e91d8ac2fdc8f3c2973a808321341c2d80caa8d0ecad4f0db65 from runtime: `/usr/bin/nvidia-container-runtime delete --force 672a332467da4e91d8ac2fdc8f3c2973a808321341c2d80caa8d0ecad4f0db65` failed: exit status 1 
andrews@deeplearning:~$ podman --version
podman version 1.9.2
andrews@deeplearning:~$ cat /tmp/nvidia-container-runtime.log
2020/05/18 23:47:47 Running /usr/bin/nvidia-container-runtime
2020/05/18 23:47:47 Using bundle file: /home/andrews/.local/share/containers/storage/vfs-containers/3add1cc2bcb9cecde045877d9a0e4d3ed9f64d304cd5cb07fd0e072bf163a170/userdata/config.json
2020/05/18 23:47:47 prestart hook path: /usr/bin/nvidia-container-runtime-hook
2020/05/18 23:47:47 Prestart hook added, executing runc
2020/05/18 23:47:47 Looking for "docker-runc" binary
2020/05/18 23:47:47 Runc path: /usr/bin/docker-runc
2020/05/18 23:47:48 Running /usr/bin/nvidia-container-runtime
2020/05/18 23:47:48 Command is not "create", executing runc doing nothing
2020/05/18 23:47:48 Looking for "docker-runc" binary
2020/05/18 23:47:48 Runc path: /usr/bin/docker-runc
2020/05/18 23:47:48 Running /usr/bin/nvidia-container-runtime
2020/05/18 23:47:48 Command is not "create", executing runc doing nothing
2020/05/18 23:47:48 Looking for "docker-runc" binary
2020/05/18 23:47:48 "docker-runc" binary not found
2020/05/18 23:47:48 Looking for "runc" binary
2020/05/18 23:47:48 ERROR: find runc path: exec: "runc": executable file not found in $PATH
andrews@deeplearning:~$ nvidia-container-runtime --version
runc version 1.0.0-rc10
commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
spec: 1.0.1-dev
andrews@deeplearning:~$ whereis runc
runc: /usr/bin/runc
andrews@deeplearning:~$ whereis docker-runc
docker-runc: /usr/bin/docker-runc

do you know what it can be?

The error you are getting looks like the $PATH was not being passed into you OCI Runtime?

Yes, it's strange...

qhaas commented
  1. Modify /etc/nvidia-container-runtime/config.toml and change these values:
    ...
  2. run it rootless as podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ nvidia/cuda:10.2-devel-ubi8 /usr/bin/nvidia-smi

This did the trick for me, thanks. I'm pondering the user/process isolation ramifications of these changes on a multi-user system. Hopefully, RH/NVDA can get this as elegant as Docker's --gpus=all without significantly degrading the security benefits of rootless podman over docker...

If you leave the SELinux enabled, what AVC's are you seeing?

Amazing work! I was able to get to run GPU enabled containers on Fedora 32 using centos8 repos, and only modifying the /etc/nvidia-container-runtime/config.toml changing no-cgroups = true. I was wondering what are the implications of not using the hooks-dir ?

Thanks

image

Update: Checking a tensorflow image, works flawlessly:

image

Podman rootless with version 1.9.3

For anyone who is looking to have rootless "nvidia-docker" be more or less seamless with podman I would suggest the following changes:

$ cat ~/.config/containers/libpod.conf 
hooks_dir = ["/usr/share/containers/oci/hooks.d", "/etc/containers/oci/hooks.d"]
label = false
$ grep no-cgroups /etc/nvidia-container-runtime/config.toml 
no-cgroups = true

After the above changes on Fedora 32 I can run nvidia-smi using just:

$ podman run -it --rm nvidia/cuda:10.2-base nvidia-smi
Fri Jun 26 22:49:50 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN RTX           Off  | 00000000:08:00.0  On |                  N/A |
| 41%   35C    P8     5W / 280W |    599MiB / 24186MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

The only annoyance is needing to edit /etc/nvidia-container-runtime/config.toml whenever there is a package update for nvidia-container-toolkit, which fortunately doesn't happen too often. If there was someway to make changes to config.toml persistent across updates or an user config file (without using some hack like chattr +i) then this process would be really smooth.

Maybe in the future a more targeted approach for disabling SELinux will come along that is more secure than just disabling labeling completely for lazy people like myself. I only run a few GPU-based containers here and there so I'm personally not too concerned.

The instructions here worked for me on Fedora 32, however the problem reappears if I specify --userns keep-id:

Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime error

Is that expected behaviour?

The instructions here worked for me on Fedora 32, however the problem reappears if I specify --userns keep-id:

Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime error

Is that expected behaviour?

Make sure you have modified the file at: /etc/nvidia-container-runtime/config.toml

Everytime that the nvidia-container is updated it will reset the default values and you should change the values of:

#no-cgroups=false
no-cgroups = true

@Davidnet Even after the above modification, I am able to reproduce @invexed's error if I try to run the cuda-11 containers. Note the latest tag currently points to cuda 11.

$ podman run --rm --security-opt=label=disable nvidia/cuda:11.0-base-rc /usr/bin/nvidia-smi
Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime erro

But not when trying to run a cuda 10.2 container or lower

$ podman run --rm --security-opt=label=disable nvidia/cuda:10.2-base /usr/bin/nvidia-smi
Sun Jul 12 15:57:40 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   60C    P0    37W / 230W |    399MiB /  8116MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Make sure you have modified the file at: /etc/nvidia-container-runtime/config.toml

Thanks for the reply. I have indeed modified this file. The container runs with podman run --rm --security-opt label=disable -u 0:0 container, but podman run --rm --security-opt label=disable --userns keep-id -u $(id -u):$(id -g) container results in the above error.

EDIT: I have CUDA 10.2 installed:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   33C    P8    N/A /  N/A |     42MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1565      G   /usr/libexec/Xorg                             20MiB |
|    0      2013      G   /usr/libexec/Xorg                             20MiB |
+-----------------------------------------------------------------------------+

EDIT: I have CUDA 10.2 installed:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   33C    P8    N/A /  N/A |     42MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1565      G   /usr/libexec/Xorg                             20MiB |
|    0      2013      G   /usr/libexec/Xorg                             20MiB |
+-----------------------------------------------------------------------------+

You need a 450 driver to run CUDA 11.0 containers. The host CUDA version (or even none at all) doesn't matter, but the driver version does when running a CUDA container. nvidia-docker makes this error more obvious compared to podman. After updating your driver you should be able to run the container.

You need a 450 driver to run CUDA 11.0 containers. The host CUDA version (or even none at all) doesn't matter, but the driver version does when running a CUDA container. nvidia-docker makes this error more obvious compared to podman. After updating your driver you should be able to run the container.

Apologies for the confusion, but I'm actually trying to run a CUDA 10.0.130 container. Updating the driver may fix @mjlbach's problem though.

To be more precise, I'm installing CUDA via https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux within an image based on archlinux.

podman run --rm --security-opt label=disable -u $(id -u):$(id -g) --userns keep-id container

triggers Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime error, but

podman run --rm --security-opt label=disable -u 0:0 container

does not. The problem seems to be related to the specification of --userns keep-id.

qhaas commented

You can add an SELinux policy, see here: https://github.com/mjlbach/podman_ml_containers/blob/master/selinux.sh

Interesting, per the link in that script to the DGX project, looks like nVidia has already solved SELinux woes on EL7 with nvidia-container. There are plenty of warnings in that project about how it has only been tested on DGX running EL7, would be great if nVidia made this policy available for general use with EL7/EL8 and bundled it inside the nvidia-container-runtime package(s).

That should allow us to use rootless podman with GPU acceleration without --security-opt label=disable, but I don't know the security implications of said policy...

UPDATE: Requested that the DGX selinux update be made part of this package in Issue NVIDIA/nvidia-docker#121

Hi. Folks, I've hit this same wall as other person: NVIDIA/nvidia-container-toolkit#182. Any idea why that would happen?

@zeroepoch You can add an SELinux policy, see here: https://github.com/mjlbach/podman_ml_containers/blob/master/selinux.sh

I finally got around to trying this SELinux module and it worked. I need to add --security-opt label=type:nvidia_container_t still, but that should be more secure than disabling labels. What prompted this attempt to try again was that libpod.conf was deprecated and I was converting my settings to ~/.config/containers/containers.conf. I don't need anything in there anymore with this additional option. Now I just need to figure out how to make it default since I pretty much just run nvidia GPU containers.

For anyone who wants to disable labels still to make the CLI simpler, here are the contents of containers.conf above:

[containers]
label = false

I don't know if this is the right place to ask, and I can open a separate issue if needed.

I'm testing rootless Podman v3.0 with crun v0.17 on our Summit test systems at Oak Ridge (IBM Power 9 with Nvidia Tesla V100 GPUs, RHEL 8.2). We have a restriction that we can't setup and maintain the subuid/subgid mappings for each of our users in the /etc/sub[uid|gid] files. That would be a giant administrative overhead since that mapping would have to be maintained on every node. Currently pulling or building cuda containers works just fine. But when trying to run it.

% podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ oci-archive:/ccs/home/subil/subil-containers-oci/simplecuda nvidia-smi
Getting image source signatures
Copying blob 5ef3c0b978d0 done
Copying blob d23be3dac067 done
Copying blob 786d8ed1601c done
Copying blob 6e99435589e0 done
Copying blob 93d25f6f9464 done
Copying blob d1ababb2c734 done
Copying config beba83a3b2 done
Writing manifest to image destination
Storing signatures
Error: OCI runtime error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1)

Here, simplecuda is just an oci-archive of docker.io/nvidia/cuda-ppc64le:10.2-base-centos7 (our HPC system uses IBM PowerPC).

The nvidia-container-toolkit.log looks like this

-- WARNING, the following logs are for debugging purposes only --

I0330 21:24:39.001988 1186667 nvc.c:282] initializing library context (version=1.3.0, build=16315ebdf4b9728e899f615e208b50c41d7a5d15)
I0330 21:24:39.002033 1186667 nvc.c:256] using root /
I0330 21:24:39.002038 1186667 nvc.c:257] using ldcache /etc/ld.so.cache
I0330 21:24:39.002043 1186667 nvc.c:258] using unprivileged user 65534:65534
I0330 21:24:39.002058 1186667 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0330 21:24:39.002241 1186667 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W0330 21:24:39.002259 1186667 nvc.c:167] skipping kernel modules load due to user namespace
I0330 21:24:39.002400 1186672 driver.c:101] starting driver service
E0330 21:24:39.002442 1186672 driver.c:161] could not start driver service: privilege change failed: operation not permitted
I0330 21:24:39.003214 1186667 driver.c:196] driver service terminated successfully

I've tried a whole variety of different Podman flag combinations mentioned earlier in this issue thread. None have worked. They all have the same errors above in the output and the log file.

I have the hook json file properly set up

% cat /usr/share/containers/oci/hooks.d/oci-nvidia-hook.json
{
    "version": "1.0.0",
    "hook": {
        "path": "/usr/bin/nvidia-container-toolkit",
        "args": ["nvidia-container-toolkit", "prestart"],
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
        ]
    },
    "when": {
        "always": true,
        "commands": [".*"]
    },
    "stages": ["prestart"]
}

The nvidia-container-runtime config.toml looks like this

[76a@raptor07 ~]$ cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
debug = "/tmp/.nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig"

[nvidia-container-runtime]
debug = "/tmp/.nvidia-container-runtime.log"

My storage.conf looks like this

% cat ~/.config/containers/storage.conf
[storage]
driver = "overlay"
graphroot = "/tmp/subil-containers-peak"
rootless_storage_path = "$HOME/.local/share/containers/storage"
#rootless_storage_path = "/tmp/subil-containers-storage-peak"

[storage.options]
additionalimagestores = [
]

[storage.options.overlay]
ignore_chown_errors = "true"
mount_program = "/usr/bin/fuse-overlayfs"
mountopt = "nodev,metacopy=on"

[storage.options.thinpool]

For comparison, I also tested this on a PowerPC workstation (identical to the HPC nodes: IBM Power9 with Nvidia Tesla V100, RHEL 8.2) and it's the exact same errors there too. But once we set up the subuid/subgid mappings on the workstation and did echo โ€œuser.max_user_namespaces=28633โ€ > /etc/sysctl.d/userns.conf, Podman was able to run the cuda container without issue.

[76a@raptor07 gpu]$ podman run  --rm docker.io/nvidia/cuda-ppc64le:10.2-base-centos7 nvidia-smi -L
GPU 0: Tesla V100-PCIE-16GB (UUID: GPU-4d2aad84-ad3d-430b-998c-6124d28d8e7c)

So I know the issue is that we need both the subuid/subgid mappings and the user.max_user_namespaces. I want to know if it is possible to get the nvidia-container-toolkit working with rootless Podman without needing the subuid/subgid mappings.

For reference, we had a related issue (containers/podman#8580) with MPI not working because of the lack of subuid/subgid mappings. @giuseppe was able to patch crun and Podman to make that work for Podman v3 and crun >=v0.17. I wanted to know if there was something that could be done here to make the nvidia-container-toolkit also work under the same conditions.

I'm happy to provide more details if you need.

I have posted this here but it seems this issue is more relavent and is still open, so I copy it here.
I encountered exactly the same problem with podman 3.0.1 and nvidia-container-runtime 3.4.0-1

/usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH

After some attempts, I find out that --cap-add AUDIT_WRITE solves this problem.

2021-04-18 12-06-57ๅฑๅน•ๆˆชๅ›พ

I have totally no idea why this would even work, though.
Here's my podman info, I'm happy to offer any further detailed info if asked.

host:
  arch: amd64
  buildahVersion: 1.19.4
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: /usr/bin/conmon ็”ฑ conmon 1:2.0.27-1 ๆ‰€ๆ‹ฅๆœ‰
    path: /usr/bin/conmon
    version: 'conmon version 2.0.27, commit: 65fad4bfcb250df0435ea668017e643e7f462155'
  cpus: 16
  distribution:
    distribution: manjaro
    version: unknown
  eventLogger: journald
  hostname: manjaro
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.9.16-1-MANJARO
  linkmode: dynamic
  memFree: 26319368192
  memTotal: 33602633728
  ociRuntime:
    name: /usr/bin/nvidia-container-runtime
    package: /usr/bin/nvidia-container-runtime ็”ฑ nvidia-container-runtime-bin 3.4.0-1 ๆ‰€ๆ‹ฅๆœ‰
    path: /usr/bin/nvidia-container-runtime
    version: |-
      runc version 1.0.0-rc93
      commit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
      spec: 1.0.2-dev
      go: go1.16.2
      libseccomp: 2.5.1
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    selinuxEnabled: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns ็”ฑ slirp4netns 1.1.9-1 ๆ‰€ๆ‹ฅๆœ‰
    version: |-
      slirp4netns version 1.1.9
      commit: 4e37ea557562e0d7a64dc636eff156f64927335e
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 0
  swapTotal: 0
  uptime: 1h 50m 44.99s (Approximately 0.04 days)
registries:
  docker.io:
    Blocked: false
    Insecure: false
    Location: hub-mirror.c.163.com
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: docker.io
  search:
  - docker.io
store:
  configFile: /home/wangyize/.config/containers/storage.conf
  containerStore:
    number: 30
    paused: 0
    running: 1
    stopped: 29
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: /usr/bin/fuse-overlayfs ็”ฑ fuse-overlayfs 1.5.0-1 ๆ‰€ๆ‹ฅๆœ‰
      Version: |-
        fusermount3 version: 3.10.2
        fuse-overlayfs: version 1.5
        FUSE library version 3.10.2
        using FUSE kernel interface version 7.31
  graphRoot: /home/wangyize/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 2
  runRoot: /run/user/1000/containers
  volumePath: /home/wangyize/.local/share/containers/storage/volumes
version:
  APIVersion: 3.0.0
  Built: 1613921386
  BuiltTime: Sun Feb 21 23:29:46 2021
  GitCommit: c640670e85c4aaaff92741691d6a854a90229d8d
  GoVersion: go1.16
  OsArch: linux/amd64
  Version: 3.0.1

Does anyone have any idea what would require the AUDIT_WRITE capability?

qhaas commented

AUDIT_WRITE is a capability I'd rather not add... Looks like runc has it by default?

In the OCI/runc spec they are even more drastic only retaining, audit_write, kill, and net_bind_service

@zjuwyz @rhatdan

Looking at the error message, the nvidia-container-runtime (a simple shim for runc) is failing to find runc. This is implemented here: https://github.com/NVIDIA/nvidia-container-runtime/blob/v3.4.2/src/main.go#L96 and is due to the result of exec.LookPath failing. Internally, that is checking whether ${P}/runc exists, is not a directory, and is executable for each ${P} in the ${PATH}. This calls os.Stat and I would assume that this query would trigger an entry into the audit log.

Do you have any audit logs to confirm that this is what is causing this?

Note: at this point, no container has been created or started as the runc create command has just been intercepted and the OCI spec patched to insert the NVIDIA hook.

the error looks like an outdated runc that doesn't understand errnoRet: opencontainers/runc#2424

Without support for errnoRet, runc is not able to handle: https://github.com/containers/common/blob/master/pkg/seccomp/seccomp.json#L730-L833 and the only way to disable this chunk is to add CAP_AUDIT_WRITE.

I'd try with an updated runc first and see if it can handle the seccomp configuration generated by Podman when CAP_AUDIT_WRITE is not added

Following on my previous comment: #85 (comment)

I tested out running different versions (v1.2.0, v1.3.0 and the latest v1.3.3) of nvidia-container-toolkit and libnvidia-container for rootless Podman without subuid/subgid on x86 machines as well, with identical settings and configs as I had in the PowerPC machines. The tests on x86 show the exact same issues for rootless Podman as they did on PowerPC.

@secondspass thanks for confirming that you're able to reproduce on an x86 machine. Do you have a simple way for us to reproduce this internally? This would allow us to better assess what the requirements are for getting this working.

We are in the process of reworking how the NVIDIA Container Stack works and this should address these kinds of issues, as we would make more use of the low-level runtime (crun in this case).

qhaas commented

Do you have a simple way for us to reproduce this internally?

While @secondspass reported bug was with CentOS 8.3, I can report that it exists in x86-64 CentOS8 Streams as well. Here is how to reproduce in centos8-streams (which has an updated podman/crun stack):

  1. Verify nvidia cuda repos and nvidia-container-toolkit repos are enabled
  2. Deploy nvidia proprietary drivers: # dnf module install nvidia-driver:465-dkms, reboot and verify nvidia-smi works
  3. Deploy podman/crun stack: # dnf install crun podman skopeo buildah slirp4netns
  4. Enable use of containers without the need for subuid/subgid (per @secondspass ):
cat ~/.config/containers/storage.conf
[storage]
driver = "overlay"
graphroot = "/tmp/${USER}-containers-peak"
rootless_storage_path = "${HOME}/.local/share/containers/storage"

[storage.options]
additionalimagestores = [
]

[storage.options.overlay]
ignore_chown_errors = "true"
mount_program = "/usr/bin/fuse-overlayfs"
mountopt = "nodev,metacopy=on"

[storage.options.thinpool]
  1. Verify current user's subuid/subgid is not set since they get automatically added if one users certain CL tools to add users:
$ grep $USER /etc/subuid | wc -l
0
$ grep $USER /etc/subgid | wc -l
0
  1. Verify rootless containers work (without GPU acceleration):
$ podman run --rm docker.io/centos:8 cat /etc/redhat-release
CentOS Linux release 8.3.2011
  1. Deploy libnvidia-container-tools: # dnf install nvidia-container-toolkit
  2. Modify configuration to support podman / rootless (per @secondspass and others above ):
cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
debug = "/tmp/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig"

[nvidia-container-runtime]
debug = "/tmp/nvidia-container-runtime.log"
  1. Test rootless podman with gpu acceleration and no subuid/subgid, it fails:
$ podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ docker.io/nvidia/cuda:10.2-base-centos8 nvidia-smi -L
...
Error: OCI runtime error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1)
$ cat /tmp/nvidia-container-toolkit.log 

-- WARNING, the following logs are for debugging purposes only --

I0421 13:52:26.487793 6728 nvc.c:372] initializing library context (version=1.3.3, build=bd9fc3f2b642345301cb2e23de07ec5386232317)
I0421 13:52:26.487987 6728 nvc.c:346] using root /
I0421 13:52:26.488002 6728 nvc.c:347] using ldcache /etc/ld.so.cache
I0421 13:52:26.488013 6728 nvc.c:348] using unprivileged user 65534:65534
I0421 13:52:26.488067 6728 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0421 13:52:26.488264 6728 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
W0421 13:52:26.488328 6728 nvc.c:249] skipping kernel modules load due to user namespace
I0421 13:52:26.488877 6733 driver.c:101] starting driver service
E0421 13:52:26.489031 6733 driver.c:161] could not start driver service: privilege change failed: operation not permitted
I0421 13:52:26.498449 6728 driver.c:196] driver service terminated successfully
  1. (sanity check) verify it works WITH sudo:
$ sudo podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ docker.io/nvidia/cuda:10.2-base-centos8 nvidia-smi -L
...
GPU 0: NVIDIA Tesla V100-PCIE-32GB (UUID: GPU-0a55d110-f8ea-4209-baa7-0e5675c7e832)

Version info for my run:

$ cat /etc/redhat-release 
CentOS Stream release 8
$ nvidia-smi | grep Version
NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3
$ nvidia-container-cli --version
version: 1.3.3
$ crun --version
crun version 0.18
$ podman --version
podman version 3.1.0-dev

Update: Spun this issue off into its own issue

I have posted this here but it seems this issue is more relavent and is still open, so I copy it here.
I encountered exactly the same problem with podman 3.0.1 and nvidia-container-runtime 3.4.0-1

/usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH

After some attempts, I find out that --cap-add AUDIT_WRITE solves this problem.

2021-04-18 12-06-57ๅฑๅน•ๆˆชๅ›พ

I have totally no idea why this would even work, though.
Here's my podman info, I'm happy to offer any further detailed info if asked.

host:
  arch: amd64
  buildahVersion: 1.19.4
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: /usr/bin/conmon ็”ฑ conmon 1:2.0.27-1 ๆ‰€ๆ‹ฅๆœ‰
    path: /usr/bin/conmon
    version: 'conmon version 2.0.27, commit: 65fad4bfcb250df0435ea668017e643e7f462155'
  cpus: 16
  distribution:
    distribution: manjaro
    version: unknown
  eventLogger: journald
  hostname: manjaro
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.9.16-1-MANJARO
  linkmode: dynamic
  memFree: 26319368192
  memTotal: 33602633728
  ociRuntime:
    name: /usr/bin/nvidia-container-runtime
    package: /usr/bin/nvidia-container-runtime ็”ฑ nvidia-container-runtime-bin 3.4.0-1 ๆ‰€ๆ‹ฅๆœ‰
    path: /usr/bin/nvidia-container-runtime
    version: |-
      runc version 1.0.0-rc93
      commit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
      spec: 1.0.2-dev
      go: go1.16.2
      libseccomp: 2.5.1
  os: linux
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    selinuxEnabled: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns ็”ฑ slirp4netns 1.1.9-1 ๆ‰€ๆ‹ฅๆœ‰
    version: |-
      slirp4netns version 1.1.9
      commit: 4e37ea557562e0d7a64dc636eff156f64927335e
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 0
  swapTotal: 0
  uptime: 1h 50m 44.99s (Approximately 0.04 days)
registries:
  docker.io:
    Blocked: false
    Insecure: false
    Location: hub-mirror.c.163.com
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: docker.io
  search:
  - docker.io
store:
  configFile: /home/wangyize/.config/containers/storage.conf
  containerStore:
    number: 30
    paused: 0
    running: 1
    stopped: 29
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: /usr/bin/fuse-overlayfs ็”ฑ fuse-overlayfs 1.5.0-1 ๆ‰€ๆ‹ฅๆœ‰
      Version: |-
        fusermount3 version: 3.10.2
        fuse-overlayfs: version 1.5
        FUSE library version 3.10.2
        using FUSE kernel interface version 7.31
  graphRoot: /home/wangyize/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 2
  runRoot: /run/user/1000/containers
  volumePath: /home/wangyize/.local/share/containers/storage/volumes
version:
  APIVersion: 3.0.0
  Built: 1613921386
  BuiltTime: Sun Feb 21 23:29:46 2021
  GitCommit: c640670e85c4aaaff92741691d6a854a90229d8d
  GoVersion: go1.16
  OsArch: linux/amd64
  Version: 3.0.1

The fact that this works and solved this problem for me as well, tells me this is a race condition.

I am a bit confused the current state of podman rootless with gpus.

I am on ubuntu 18.04 arm64 host.

I have made the changes to:

disable-require = false

[nvidia-container-cli]
environment = []
debug = "/tmp/nvidia-container-toolkit.log"
load-kmods = true
no-cgroups = true
ldconfig = "@/sbin/ldconfig.real"

[nvidia-container-runtime]
debug = "/tmp/nvidia-container-runtime.log"
  1. Is the change above only required on machines that are using cgroups v2 ?

I am only able to get GPU access if I run podman with sudo and --privileged (I need both: Update: See comment below). So far have found no other way to run podman with GPU access, even with the above cgroups change, my root does not break.

  1. What does this mean if my root is not breaking with the cgroup change?

When I run rootless, I see the following error:

Error: OCI runtime error: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods --debug=/dev/stderr configure --ldconfig=@/sbin/ldconfig.real --device=all --utility --pid=20052 /data/gpu/rootfs]\\\\n\\\\n-- WARNING, the following logs are for debugging purposes only --\\\\n\\\\nI0427 00:54:00.184026 20064 nvc.c:281] initializing library context (version=0.9.0+beta1, build=77c1cbc2f6595c59beda3699ebb9d49a0a8af426)\\\\nI0427 00:54:00.184272 20064 nvc.c:255] using root /\\\\nI0427 00:54:00.184301 20064 nvc.c:256] using ldcache /etc/ld.so.cache\\\\nI0427 00:54:00.184324 20064 nvc.c:257] using unprivileged user 65534:65534\\\\nI0427 00:54:00.184850 20069 driver.c:134] starting driver service\\\\nI0427 00:54:00.192642 20064 driver.c:231] driver service terminated with signal 15\\\\nnvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected\\\\n\\\"\""

I have tried with --security-opt=label=disable and have seen no changes in behavior.

  1. It is unclear to me what runtime people are using. Are they using standard runc or /usr/bin/nvidia-container-runtime I have tried both, and both do not work in rootless, and both work in root with privileged.

does it make any difference if you bind mount /dev from the host?

qhaas commented

security-opt=label=disable

I'm not very fluent in Ubuntu IT, but I believe that command targets SELinux. Ubuntu uses Apparmor for mandatory access control (MAC). So wouldn't the equivalent command be --security-opt 'apparmor=unconfined'?

First a slight update and correction to the above, I don't actually need --privileged I just need to define -e NVIDIA_VISIBLE_DEVICES=all and this invokes the nvidia hook, which works with sudo.
Rootless is not fixed still.

does it make any difference if you bind mount /dev from the host?

@giuseppe It does not fix rootless, but in rootful podman using sudo this makes the hook no longer required, which makes sense.

The issue with rootless is that I can't mount all of /dev/

Error: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"open /dev/console: permission denied\"": OCI permission denied

So I did the next best thing and attempted to mount all of the nv* devices under /dev/.
I tried 2 ways one with -v and the other using the --device flag and adding in the nvidia components.
That does not allow the rootless container to detect the GPUS still!
It does work using rootful podman using sudo.

The difference is when it is mapped with sudo, I actually see the devices belong to the root:video.
Whereas in rootless mode, I only see nobody:nogroup

I am wondering if it is related to the video group? The error I get in rootless mode is the following when trying to run CUDA code:

Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading == false: CUDA: no CUDA-capable device is detected

When I look in the container under /dev in the rootless container:

$ls -la /dev
...
crw-rw----  1 nobody nogroup 505,  1 Apr 26 18:32 nvhost-as-gpu
...

For ALL of the nv* devices in the rootless container, they don't have a user/group mapped.

In the rootful container that uses sudo:

...
crw-rw----  1 root video   505,   1 Apr 26 18:32 nvhost-as-gpu
...

For ALL of the nv* devices in the rootful container that uses sudo, they have root:video

So I am pretty certain I need video mapped into the container. But am unclear on how to do this.
I have mapped in the video group with --group-add as a test, but I believe I also need to use --gidmap because even with group-add it still shows as nogroup.

My understanding of the user/group mapping podman does is a little fuzzy so I will take suggestions on how to do this ๐Ÿ˜„

Let me know what you think @giuseppe

security-opt=label=disable

I'm not very fluent in Ubuntu IT, but I believe that command targets SELinux. Ubuntu uses Apparmor for mandatory access control (MAC). So wouldn't the equivalent command be --security-opt 'apparmor=unconfined'?

@qhaas Excellent point. That explains why that flag seems to be a no-op for me.
Also my current system, at least right now does not have apparmor loaded so I shouldn't need either of those flags.
I tired it though just for sanity, and confirmed no difference in behavior.
Thank you!

If you have any suggestions on gid mappings please let me know!

First a slight update and correction to the above, I don't actually need --privileged I just need to define -e NVIDIA_VISIBLE_DEVICES=all and this invokes the nvidia hook, which works with sudo.
Rootless is not fixed still.

does it make any difference if you bind mount /dev from the host?

@giuseppe It does not fix rootless, but in rootful podman using sudo this makes the hook no longer required, which makes sense.

The issue with rootless is that I can't mount all of /dev/

could you use -v /dev:/dev --mount type=devpts,destination=/dev/pts ?

@giuseppe I tried adding: -v /dev:/dev --mount type=devpts,destination=/dev/pts

And got the following error:

DEBU[0004] ExitCode msg: "container create failed (no logs from conmon): eof"
Error: container create failed (no logs from conmon): EOF

Not sure how to enable more logs in conmon

If I switch to use: --runtime /usr/local/bin/crun with -v /dev:/dev --mount type=devpts,destination=/dev/pts

I get the following error:

Error: OCI runtime error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1)

From previous encounters with this error, the way I understand this message is that the video group that is apart of the mount location, is not being mapped into the container correctly.

Just an FYI, I was able to get rootless podman to access the GPU if I added my user to the video group and used the runtime crun. More details here: containers/podman#10166

I am still interested in a path forward without adding my user to the video group, but this is a good progress step.

Just as an update that have been posted in the containers/podman#10166

I have been able to access my GPU as a rootless user that belongs to the video group, using the nvidia hook:

cat /data/01-nvhook.json
{
  "version": "1.0.0",
  "hook": {
    "path": "/usr/bin/nvidia-container-toolkit",
    "args": ["nvidia-container-toolkit", "prestart"],
    "env": ["NVIDIA_REQUIRE_CUDA=cuda>=10.1"]
  },
  "when": {
    "always": true
  },
  "stages": ["prestart"]
}

But also this one seems to work as well:

cat /data/01-nvhook-runtime-hook.json
{
  "version": "1.0.0",
  "hook": {
    "path": "/usr/bin/nvidia-container-runtime-hook",
    "args": ["/usr/bin/nvidia-container-runtime-hook", "prestart"],
    "env": []
  },
  "when": {
    "always": true
  },
  "stages": ["prestart"]
}

Separately, without hooks I was able to use the --device mounts and access my GPU as well.

The important steps that had to be taken here was:

  1. Rootless user needs to belong to the video group.
  2. Use Podman flags --group-add keep-groups (This correctly maps the video group into the container. )
  3. Use crun and not runc because crun is the only runtime that supports --group-add keep-groups

I have a related issue here : containers/podman#10212 to get this working in C++ with execv and am seeing an odd issue.

Hi,
I've been using containers with acess to GPUs, however i've been noted that for each reboot i need to run allways before starting the 1st container:
nvidia-smi

otherwise i get the error:
Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime error

After that i also need to run the NVIDIA Device Node Verification script to proper startup the /dev/nvidia-uvm for CUDA applications as described in this post:
tensorflow/tensorflow#32623 (comment)

Just to share my HW configuration that works (only with --privileged tag) on root and rootless:

NAME="CentOS Linux"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
CentOS Linux release 8.3.2011
CentOS Linux release 8.3.2011

getenforce:
Enforcing

podman info:

  arch: amd64
  buildahVersion: 1.20.1
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.27-1.el8.1.5.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.27, commit: '
  cpus: 80
  distribution:
    distribution: '"centos"'
    version: "8"
  eventLogger: journald
  hostname: turing
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 2002
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 2002
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 4.18.0-240.22.1.el8_3.x86_64
  linkmode: dynamic
  memFree: 781801324544
  memTotal: 809933586432
  ociRuntime:
    name: crun
    package: crun-0.19.1-2.el8.3.1.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.19.1
      commit: 1535fedf0b83fb898d449f9680000f729ba719f5
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/2002/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    selinuxEnabled: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.8-4.el8.7.6.x86_64
    version: |-
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.3.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.4.3
  swapFree: 42949668864
  swapTotal: 42949668864
  uptime: 29h 16m 48.14s (Approximately 1.21 days)
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 29
    paused: 0
    running: 0
    stopped: 29
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.5.0-1.el8.5.3.x86_64
      Version: |-
        fusermount3 version: 3.2.1
        fuse-overlayfs: version 1.5
        FUSE library version 3.2.1
        using FUSE kernel interface version 7.26
  graphRoot: /home/user/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 28
  runRoot: /run/user/2002/containers
  volumePath: /home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 3.1.2
  Built: 1619185402
  BuiltTime: Fri Apr 23 14:43:22 2021
  GitCommit: ""
  GoVersion: go1.14.12
  OsArch: linux/amd64
  Version: 3.1.2
nvidia-smi | grep Version
NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3

cat /etc/nvidia-container-runtime/config.toml

disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig"

Hi @Ru13en the issue you described above seems to be a different one to what is being discussed here. Would you mind moving this to a separate GitHub issue? (I would assume this is because the nvidia container toolkit cannot load the kernel modules if it does not have the required permissions. Running nvidia-smi loads the kernel modules and also ensures that the device nodes are created).

@elezar Thanks, i've opened another issue:
#142

For anybody who has the same issue as me ("nvidia-smi": executable file not found in $PATH: OCI not found or no NVIDIA GPU device is present: /dev/nvidia0 does not exist, this is how I made it work on kubuntu 21.04 rootless:

Add your user to group video if not present:
usermod -a -G video $USER

/usr/share/containers/oci/hooks.d/oci-nvidia-hook.json:

{
  "version": "1.0.0",
  "hook": {
    "path": "/usr/bin/nvidia-container-runtime-hook",
    "args": ["/usr/bin/nvidia-container-runtime-hook", "prestart"],
    "env": []
  },
  "when": {
    "always": true
  },
  "stages": ["prestart"]
}

/etc/nvidia-container-runtime/config.toml:

disable-require = false

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-runtime-hook.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"

podman run -it --group-add video docker.io/tensorflow/tensorflow:latest-gpu-jupyter nvidia-smi

Sun Jul 18 11:45:06 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.31       Driver Version: 465.31       CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:09:00.0  On |                  N/A |
| 31%   43C    P8     6W / 215W |   2582MiB /  7979MiB |      9%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

@rhatdan @nvjmayo Turns out that getting rootless podman working with nvidia on centos 7 is a bit more complicated, at least for us.

Here is our scenario on brand new centos 7.7 machine

  1. run nvidia-smi with rootless podman
    1.result: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 0 caused "error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\n""
  2. run podman with user=root
    2.result: nvidia-smi works
  3. run podman rootless
    3.result: nvidia-smi works!
  4. reboot machine, run podman rootless
    4.result: fails again with same error as Plugin requirements #1

Conclusion: running nvidia container with podman as root changes the environment for rootless to work. Environment cleared on reboot.

One other comment: podman as root and rootless podman cannot run with the same /etc/nvidia-container-runtime/config.toml - no-cgroups must =false for root and =true for rootless

Hi, have you figure out the solution?
I have exactly the same symptom as yours.

Rootless running only works after launching container as root at least once. And reboot reset everything.
I am using RHEL 8.4 and can't believe this still happens after one year ...

qhaas commented

For those dropping into this issue, nvidia has documented getting GPU acceleration working with podman.

That's awesome! The documentation is almost the same as my fix here in this thread :D

any chance they can update the version of podman in example. That one is pretty old.

@fuomag9 Are you using crun as opposed to runc out of curiosity?
Does it work with both in rootless for you? Or just crun?

@fuomag9 Are you using crun as opposed to runc out of curiosity? Does it work with both in rootless for you? Or just crun?

Working for me with both runc and crun set via /etc/containers/containers.conf with runtime = "XXX"

qhaas commented

--hooks-dir /usr/share/containers/oci/hooks.d/ does not seem to be needed anymore, at least with podman 3.3.1 and nvidia-container-toolkit 1.7.0.

For RHEL8 systems where selinux is enforcing, it it 'best practice' to add the nvidia selinux policy module and run podman with --security-opt label=type:nvidia_container_t (per RH documentation, even on non-DGX systems) or just run podman with --security-opt=label=disable (per nvidia documentation)? Unclear if there is any significant benefit to warrant messing with SELinux policy.

For folks finding this issue, especially anyone trying to do this on RHEL8 after following https://www.redhat.com/en/blog/how-use-gpus-containers-bare-metal-rhel-8, here's the current status/known issues that I've encountered. Hopefully this saves someone some time.

As noted in the comments above you can run containers as root without issue, but if you try to use --userns keep-id you're going to have a bad day.

Things that need to be done ahead of time to run rootless containers are documented in https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#step-3-rootless-containers-setup but the cheat sheet version is:

  1. Install nvidia-container-toolkit
  2. Update /etc/nvidia-container-runtime/config.toml and set no-cgroups = true
  3. Use NVIDIA_VISIBILE_DEVICES as part of your podman environment.
  4. Specify --hooks-dir=/usr/share/containers/oci/hooks.d/ (may not strictly be needed).

If you do that, then running: podman run -e NVIDIA_VISIBLE_DEVICES=all --hooks-dir=/usr/share/containers/oci/hooks.d/ --rm -ti myimage nvidia-smi should result in the usual nvidia-smi output. But, you'll note that the user in the container is root and that may not be what you want. If you use --userns keep-id; e.g. podman run --userns keep-id -e NVIDIA_VISIBLE_DEVICES=all --hooks-dir=/usr/share/containers/oci/hooks.d/ --rm -ti myimage nvidia-smi you will get an error that states: Error: OCI runtime error: crun: error executing hook /usr/bin/nvidia-container-toolkit (exit code: 1). From my reading above the checks that are run require the user to be root in the container.

Now for the workaround. You don't need this hook, you just need the nvidia-container-cli tool. All the hook really does is mount the correct libraries, devices, and binaries from the underlying system into the container. We can use nvidia-container-cli -k list and find to accomplish the same thing. Here's my one-liner below. Note that I'm excluding both -e NVIDIA_VISIBILE_DEVICES=all and --hooks-dir=/usr/share/containers/oci/hooks.d/.

Here's what it looks like:
podman run --userns keep-id $(for file in $(nvidia-container-cli -k list); do find -L $(dirname $file) -xdev -samefile $file; done | awk '{print " -v "$1":"$1}' | xargs) --rm -ti myimage nvidia-smi

This is what the above is doing. We run nvidia-container-cli -k list which on my system produces output like:

$ nvidia-container-cli -k list
/dev/nvidiactl
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
/dev/nvidia-modeset
/dev/nvidia0
/dev/nvidia1
/usr/bin/nvidia-smi
/usr/bin/nvidia-debugdump
/usr/bin/nvidia-persistenced
/usr/bin/nvidia-cuda-mps-control
/usr/bin/nvidia-cuda-mps-server
/usr/lib64/libnvidia-ml.so.470.141.03
/usr/lib64/libnvidia-cfg.so.470.141.03
/usr/lib64/libcuda.so.470.141.03
/usr/lib64/libnvidia-opencl.so.470.141.03
/usr/lib64/libnvidia-ptxjitcompiler.so.470.141.03
/usr/lib64/libnvidia-allocator.so.470.141.03
/usr/lib64/libnvidia-compiler.so.470.141.03
/usr/lib64/libnvidia-ngx.so.470.141.03
/usr/lib64/libnvidia-encode.so.470.141.03
/usr/lib64/libnvidia-opticalflow.so.470.141.03
/usr/lib64/libnvcuvid.so.470.141.03
/usr/lib64/libnvidia-eglcore.so.470.141.03
/usr/lib64/libnvidia-glcore.so.470.141.03
/usr/lib64/libnvidia-tls.so.470.141.03
/usr/lib64/libnvidia-glsi.so.470.141.03
/usr/lib64/libnvidia-fbc.so.470.141.03
/usr/lib64/libnvidia-ifr.so.470.141.03
/usr/lib64/libnvidia-rtcore.so.470.141.03
/usr/lib64/libnvoptix.so.470.141.03
/usr/lib64/libGLX_nvidia.so.470.141.03
/usr/lib64/libEGL_nvidia.so.470.141.03
/usr/lib64/libGLESv2_nvidia.so.470.141.03
/usr/lib64/libGLESv1_CM_nvidia.so.470.141.03
/usr/lib64/libnvidia-glvkspirv.so.470.141.03
/usr/lib64/libnvidia-cbl.so.470.141.03
/lib/firmware/nvidia/470.141.03/gsp.bin

We then loop through each of those files and run find -L $(dirname $file) -xdev -samefile $file That finds all the symlinks to a given file. e.g.

find -L /usr/lib64 -xdev -samefile /usr/lib64/libnvidia-ml.so.470.141.03
/usr/lib64/libnvidia-ml.so.1
/usr/lib64/libnvidia-ml.so.470.141.03
/usr/lib64/libnvidia-ml.so

We loop through each of those files and use awk and xargs to create the podman cli arguments to bind mount these files into the container; e.g. -v /usr/lib64/libnvidia-ml.so.1:/usr/lib64/libnvidia-ml.so.1 -v /usr/lib64/libnvidia-ml.so.470.141.03:/usr/lib64/libnvidia-ml.so.470.141.03 -v /usr/lib64/libnvidia-ml.so:/usr/lib64/libnvidia-ml.so etc.

This effectively does what the hook does, using the tools the hook provides, but does not require the user running the container to be root, and does not require the user inside of the container to be root.

Hopefully this saves someone else a few hours.

baude commented

@decandia50 excellent information! your information really deserves to be highlighted. would you consider posting as a blog if we connect you with some people?

Please do not write a blog post with the above information. While the procedure may work on some setups, it is not a supported use of the nvidia-container-cli tool and will only work correctly und a very narrow set of assumptions.

The better solution is to use podman's integrated CDI support to have podman do the work that libnvidia-container would have otherwise done instead. The future of the nvidia stack (and device support in container runtimes in general) is CDI, and starting to use this method now will future proof how you access generic devices in the future.

Please see below for details on CDI:
https://github.com/container-orchestrated-devices/container-device-interface

We have spent the last year rearchitecting the NVIDIA container stack to work together with CDI, and as part of this have a tool coming out with the next release that will be able to generate CDI specs for nvidia devices for use with podman (and any other CDI compatible runtimes).

In the meantime, you can generate a CDI spec manually, or wait for @elezar to comment on a better method to get a CDI spec generated today.

Here is an example of a (fully functional) CDI spec on my DGX-A100 machine (excluding MIG devices):

cdiVersion: 0.4.0
kind: nvidia.com/gpu
containerEdits:
  hooks:
  - hookName: createContainer
    path: /usr/bin/nvidia-ctk
    args:
    - /usr/bin/nvidia-ctk
    - hook
    - update-ldcache
    - --folder
    - /usr/lib/x86_64-linux-gnu
  deviceNodes:
  - path: /dev/nvidia-modeset
  - path: /dev/nvidiactl
  - path: /dev/nvidia-uvm
  - path: /dev/nvidia-uvm-tools
  mounts:
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libcuda.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libcuda.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvcuvid.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvcuvid.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvoptix.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvoptix.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libGL.so.1.0.0
    hostPath: /usr/lib/x86_64-linux-gnu/libGL.so.1.0.0
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libEGL.so.1.0.0
    hostPath: /usr/lib/x86_64-linux-gnu/libEGL.so.1.0.0
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libGLESv1_CM.so.1.0.0
    hostPath: /usr/lib/x86_64-linux-gnu/libGLESv1_CM.so.1.0.0
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libGLESv2.so.2.0.0
    hostPath: /usr/lib/x86_64-linux-gnu/libGLESv2.so.2.0.0
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.460.91.03
    hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.460.91.03
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/bin/nvidia-smi
    hostPath: /usr/bin/nvidia-smi
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/bin/nvidia-debugdump
    hostPath: /usr/bin/nvidia-debugdump
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/bin/nvidia-persistenced
    hostPath: /usr/bin/nvidia-persistenced
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/bin/nvidia-cuda-mps-control
    hostPath: /usr/bin/nvidia-cuda-mps-control
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/bin/nvidia-cuda-mps-server
    hostPath: /usr/bin/nvidia-cuda-mps-server
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /var/run/nvidia-persistenced/socket
    hostPath: /var/run/nvidia-persistenced/socket
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /var/run/nvidia-fabricmanager/socket
    hostPath: /var/run/nvidia-fabricmanager/socket
    options:
    - ro
    - nosuid
    - nodev
    - bind
devices:
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia0
  name: gpu0
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia1
  name: gpu1
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia2
  name: gpu2
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia3
  name: gpu3
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia4
  name: gpu4
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia5
  name: gpu5
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia6
  name: gpu6
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia7
  name: gpu7

@elezar Can you comment on the availability of a tool to generate the CDI spec as proposed by @klueska? I'm happy to use CDI if that's the way forward. Also happy to beta test a tool if you point me towards something.

Maintaining a nvidia.json CDI spec file for multiple machines with different Nvidia drivers and other libs is a bit painful.
For instance, NVIDIA driver installer should create libnvidia-compiler.so symlink to libnvidia-compiler.so.460.91.03, etc...
The CDI nvidia.json will just take the symlinks to avoid the manual setting of all mappings for a particular driver version...
I am already using CDI specs in our machines but I would like to test a tool to generate de CDI spec for any system...

@Ru13en we have a WIP Merge Request that adds an:

nvidia-ctk info generate-cdi

command to the NVIDIA Container Toolkit. This idea being that this could be run at boot or triggered on a driver installation / upgrade. We are working at getting an v1.12.0-rc.1 out that includes this functionality for early testing and feedback.

@elezar Any ETA by when we can expect the v1.12.0-rc.1 version?

It will be released next week.

I tried using the WIP version of nvidia-ctk(from the master branch of https://gitlab.com/nvidia/container-toolkit/container-toolkit) and was able to get it working with rootless podman, but not without issues. I have documented them in https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/issues/8.
@rhatdan The upcoming version of nvidia cdi generator will be using cdi version 0.5.0 while the latest podman version 4.2.0 still uses 0.4.0. Any idea when 4.3.0 might be available? (I see that 4.3.0-rc1 uses 0.5.0)

Thanks for the confirmation @starry91. The official release of v1.12.0-rc.1 has been delayed a little but thanks for testing the toolking nonetheless. I will have a look a the issue you created and update the tooling before releasing the rc.

elezar commented

We have recently updated our Podman support and now recommend using CDI -- which is supported natively in more recent Podman versions.

See https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-podman for details.