Running nvidia-container-runtime with podman is blowing up.
rhatdan opened this issue ยท 90 comments
-
Issue or feature description
rootless and rootful podman does not work with the nvidia plugin -
Steps to reproduce the issue
Install the nvidia plugin, configure it to run with podman
execute the podman command and check if the devices is configured correctly. -
Information to attach (optional if deemed irrelevant)
Some nvidia-container information: nvidia-container-cli -k -d /dev/tty info
Kernel version from uname -a
Fedora 30 and later
Any relevant kernel output lines from dmesg
Driver information from nvidia-smi -a
Docker version from docker version
NVIDIA packages version from dpkg -l 'nvidia' or rpm -qa 'nvidia'
NVIDIA container library version from nvidia-container-cli -V
NVIDIA container library logs (see troubleshooting)
Docker command, image and tag used
I am reporting this based on other users complaining. This is what they said.
We discovered that the ubuntu 18.04 machine needed a configuration change to get rootless working with nvidia:
"no-cgroups = true" was set in /etc/nvidia-container-runtime/config.toml
Unfortunately this config change did not work on Centos 7, but it did change the rootless error to:
nvidia-container-cli: initialization error: cuda error: unknown error\\n\"""
This config change breaks podman running from root, with the error:
Failed to initialize NVML: Unknown Error
Interestingly, root on ubuntu gets the same error even though rootless works.
The Podman team would like to work with you guys to get this to work well in both root full and rootless containers if possible. But we need someone to work with.
Hello!
@rhatdan do you mind filling the following issue template: https://github.com/NVIDIA/nvidia-docker/blob/master/.github/ISSUE_TEMPLATE.md
Thanks!
I can work with the podman team.
@nvjmayo Thanks for the suggestions. Some good news and less good.
This works rootless:
podman run --rm --hooks-dir /usr/share/containers/oci/hooks.d nvcr.io/nvidia/cuda nvidia-smi
The same command continues to fail with the image: docker.io/nvidia/cuda
In fact rootless works with or without /usr/share/containers/oci/hooks.d/01-nvhook.json installed using the image: nvcr.io/nvidia/cuda
Running as root continues to fail when no-cgroups = true for either container, returning:
Failed to initialize NVML: Unknown Error
Strange I would not expect podman to run a hook that did not have a json file describing the hook.
@eaepstein I'm still struggling to reproduce the issue you see. Using docker.io/nvidia/cuda also works for me with the hooks dir.
$ podman run --rm --hooks-dir /usr/share/containers/oci/hooks.d/ docker.io/nvidia/cuda nvidia-smi
Tue Oct 22 21:35:44 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 710 Off | 00000000:65:00.0 N/A | N/A |
| 50% 38C P0 N/A / N/A | 0MiB / 2001MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
without the hook I would expect to see a failure roughly like:
Error: time="2019-10-22T14:35:14-07:00" level=error msg="container_linux.go:346: starting container process caused \"exec: \\\"nvidia-smi\\\": executable file not found in $PATH\""
container_linux.go:346: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": OCI runtime command not found error
This is because the libraries and tools get installed by the hook in order to match the host drivers. (an unfortunate limitation of tightly coupled driver+library releases)
I think there is a configuration issue and not an issue of the container image (docker.io/nvidia/cuda vs nvcr.io/nvidia/cuda).
Reviewing my earlier posts, I recommend changing my 01-nvhook.json and remove the NVIDIA_REQUIRE_CUDA=cuda>=10.1
from it. My assumption is everyone has the latest CUDA install, which was kind of a silly assumption on my part. The CUDA version doesn't have to be specified, and you can leave this environment variable out of your set up. It was an artifact of my earlier experiments.
@nvjmayo we started from scratch with a new machine (CentOS Linux release 7.7.1908) and both docker.io and nvcr.io images are working for us now too. And --hooks-dir must now be specified for both to work. Thanks for the help!
@rhatdan @nvjmayo Turns out that getting rootless podman working with nvidia on centos 7 is a bit more complicated, at least for us.
Here is our scenario on brand new centos 7.7 machine
-
run nvidia-smi with rootless podman
1.result: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\n\"" -
run podman with user=root
2.result: nvidia-smi works -
run podman rootless
3.result: nvidia-smi works! -
reboot machine, run podman rootless
4.result: fails again with same error as NVIDIA/nvidia-docker#1
Conclusion: running nvidia container with podman as root changes the environment for rootless to work. Environment cleared on reboot.
One other comment: podman as root and rootless podman cannot run with the same /etc/nvidia-container-runtime/config.toml - no-cgroups must =false for root and =true for rootless
If the nvidia hook is doing any privileged operations like modifying /dev and adding devicenodes, then this will not work with rootless. (In rootless all processes are running with the Users UID. Probably when you run rootfull, it is doing the privileged operations, so the next time you run rootless, those activities do not need to be done.
I would suggest for rootless systems, that the /dev and nvidia ops be done as a systemd unit file, so the system is preconfigured and then the rootless jobs will work fine.
After running nvidia/cuda with rootfull podman, the following exist:
crw-rw-rw-. 1 root root 195, 254 Oct 25 09:11 nvidia-modeset
crw-rw-rw-. 1 root root 195, 255 Oct 25 09:11 nvidiactl
crw-rw-rw-. 1 root root 195, 0 Oct 25 09:11 nvidia0
crw-rw-rw-. 1 root root 241, 1 Oct 25 09:11 nvidia-uvm-tools
crw-rw-rw-. 1 root root 241, 0 Oct 25 09:11 nvidia-uvm
None of these devices exist after boot. Running nvidia-smi rootless (no podman) creates:
crw-rw-rw-. 1 root root 195, 0 Oct 25 13:40 nvidia0
crw-rw-rw-. 1 root root 195, 255 Oct 25 13:40 nvidiactl
I created the other three entries using "sudo mknod -m 666 etc..." but that is not enough to run rootless. Something else is needed in the environment.
Running nvidia/cuda with rootfull podman at boot would work, but not pretty.
Thanks for the suggestion
This behavior is documented in our installation guide:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-verifications
From a userns you can't mknod
or use nvidia-modprobe
. But, if this binary is present and if it can be called in a context where setuid
works, it's an option.
There is already nvidia-persistenced
as a systemd unit file, but it won't load the nvidia_uvm
kernel modules nor create the device files, IIRC.
Another option is to use udev
rules, which is what Ubuntu is doing:
$ cat /lib/udev/rules.d/71-nvidia.rules
[...]
# Load and unload nvidia-uvm module
ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe nvidia-uvm"
ACTION=="remove", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/sbin/modprobe -r nvidia-uvm"
# This will create the device nvidia device nodes
ACTION=="add", DEVPATH=="/bus/pci/drivers/nvidia", RUN+="/usr/bin/nvidia-smi"
# Create the device node for the nvidia-uvm module
ACTION=="add", DEVPATH=="/module/nvidia_uvm", SUBSYSTEM=="module", RUN+="/sbin/create-uvm-dev-node"
Udev rules makes sense to me.
@flx42
sudo'ing the setup script in "4.5. Device Node Verification" is the only thing needed to get rootless nvidia/cuda containers running for us. It created the following devices:
crw-rw-rw-. 1 root root 195, 0 Oct 27 20:38 nvidia0
crw-rw-rw-. 1 root root 195, 255 Oct 27 20:38 nvidiactl
crw-rw-rw-. 1 root root 241, 0 Oct 27 20:38 nvidia-uvm
The udev file only created the first two and was not sufficient by itself.
We'll go with a unit file for the setup script.
Many thanks for your help.
Thanks guys, with insight from this issue and others, I was able to get podman working with my Quadro in EL7 using sudo podman run --privileged --rm --hooks-dir /usr/share/containers/oci/hooks.d docker.io/nvidia/cudagl:10.1-runtime-centos7 nvidia-smi
after installing the 'nvidia-container-toolkit' package.
Once the dust settles on how to get GPU support in rootless podman in EL7, a step-by-step guide would make for a great blog post and/or entry into the podman and/or nvidia documentation.
Hello @nvjmayo and @rhatdan. I'm wondering if there is an update on this issue or this one for how to access NVIDIA GPU's from containers run rootless with podman.
On RHEL8.1, with default /etc/nvidia-container-runtime/config.toml, and running containers with root, GPU access works as expected. Rootless does not work by default, it fails with cgroup related errors (as expected).
After modifying the config.toml file -- setting no-cgroups = true and changing the debug log file -- rootless works. However, these changes make GPU access fail in containers run as root, with error "Failed to initialize NVML: Unknown Error."
Please let me know if there is any recent documentation on how to do this beyond these two issues.
Steps to get it working on RHEL 8.1:
- Install Nvidia Drivers, make sure
nvidia-smi
works on the host - Install
nvidia-container-toolkit
from repos at
baseurl=https://nvidia.github.io/libnvidia-container/centos7/$basearch
baseurl=https://nvidia.github.io/nvidia-container-runtime/centos7/$basearch
- Modify
/etc/nvidia-container-runtime/config.toml
and change these values:
[nvidia-container-cli]
#no-cgroups = false
no-cgroups = true
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
debug = "~/./local/nvidia-container-runtime.log"
- run it rootless as
podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ nvidia/cuda:10.2-devel-ubi8 /usr/bin/nvidia-smi
/cc @dagrayvid
Thanks @jamescassell.
I repeated those steps on RHEL8.1, and nvidia-smi works as expected when running rootless. However, once those changes are made, I am unable to run nvidia-smi in a container run as root. Is this behaviour expected, or is there some change in CLI flags needed when running as root? Running as root did work before making these changes.
Is there a way to configure a system so that we can utilize GPUs with podman as root and non-root user?
I can't run podman rootless with GPU, someone can help me?
docker run --runtime=nvidia --privileged nvidia/cuda nvidia-smi
works fine but
podman run --runtime=nvidia --privileged nvidia/cuda nvidia-smi
crashes, same for
sudo podman run --runtime=nvidia --privileged nvidia/cuda nvidia-smi
Output:
$ podman run --runtime=nvidia --privileged nvidia/cuda nvidia-smi
2020/04/03 13:34:52 ERROR: /usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
Error: `/usr/bin/nvidia-container-runtime start e3ccb660bf27ce0858ee56476e58b53cd3dc900e8de80f08d10f3f844c0e9f9a` failed: exit status 1
But, runc exists:
$ whereis runc
runc: /usr/bin/runc
$ whereis docker-runc
docker-runc:
$ podman --version
podman version 1.8.2
$ cat ~/.config/containers/libpod.conf
# libpod.conf is the default configuration file for all tools using libpod to
# manage containers
# Default transport method for pulling and pushing for images
image_default_transport = "docker://"
# Paths to look for the conmon container manager binary.
# If the paths are empty or no valid path was found, then the `$PATH`
# environment variable will be used as the fallback.
conmon_path = [
"/usr/libexec/podman/conmon",
"/usr/local/libexec/podman/conmon",
"/usr/local/lib/podman/conmon",
"/usr/bin/conmon",
"/usr/sbin/conmon",
"/usr/local/bin/conmon",
"/usr/local/sbin/conmon",
"/run/current-system/sw/bin/conmon",
]
# Environment variables to pass into conmon
conmon_env_vars = [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
]
# CGroup Manager - valid values are "systemd" and "cgroupfs"
#cgroup_manager = "systemd"
# Container init binary
#init_path = "/usr/libexec/podman/catatonit"
# Directory for persistent libpod files (database, etc)
# By default, this will be configured relative to where containers/storage
# stores containers
# Uncomment to change location from this default
#static_dir = "/var/lib/containers/storage/libpod"
# Directory for temporary files. Must be tmpfs (wiped after reboot)
#tmp_dir = "/var/run/libpod"
tmp_dir = "/run/user/1000/libpod/tmp"
# Maximum size of log files (in bytes)
# -1 is unlimited
max_log_size = -1
# Whether to use chroot instead of pivot_root in the runtime
no_pivot_root = false
# Directory containing CNI plugin configuration files
cni_config_dir = "/etc/cni/net.d/"
# Directories where the CNI plugin binaries may be located
cni_plugin_dir = [
"/usr/libexec/cni",
"/usr/lib/cni",
"/usr/local/lib/cni",
"/opt/cni/bin"
]
# Default CNI network for libpod.
# If multiple CNI network configs are present, libpod will use the network with
# the name given here for containers unless explicitly overridden.
# The default here is set to the name we set in the
# 87-podman-bridge.conflist included in the repository.
# Not setting this, or setting it to the empty string, will use normal CNI
# precedence rules for selecting between multiple networks.
cni_default_network = "podman"
# Default libpod namespace
# If libpod is joined to a namespace, it will see only containers and pods
# that were created in the same namespace, and will create new containers and
# pods in that namespace.
# The default namespace is "", which corresponds to no namespace. When no
# namespace is set, all containers and pods are visible.
#namespace = ""
# Default infra (pause) image name for pod infra containers
infra_image = "k8s.gcr.io/pause:3.1"
# Default command to run the infra container
infra_command = "/pause"
# Determines whether libpod will reserve ports on the host when they are
# forwarded to containers. When enabled, when ports are forwarded to containers,
# they are held open by conmon as long as the container is running, ensuring that
# they cannot be reused by other programs on the host. However, this can cause
# significant memory usage if a container has many ports forwarded to it.
# Disabling this can save memory.
#enable_port_reservation = true
# Default libpod support for container labeling
# label=true
# The locking mechanism to use
lock_type = "shm"
# Number of locks available for containers and pods.
# If this is changed, a lock renumber must be performed (e.g. with the
# 'podman system renumber' command).
num_locks = 2048
# Directory for libpod named volumes.
# By default, this will be configured relative to where containers/storage
# stores containers.
# Uncomment to change location from this default.
#volume_path = "/var/lib/containers/storage/volumes"
# Selects which logging mechanism to use for Podman events. Valid values
# are `journald` or `file`.
# events_logger = "journald"
# Specify the keys sequence used to detach a container.
# Format is a single character [a-Z] or a comma separated sequence of
# `ctrl-<value>`, where `<value>` is one of:
# `a-z`, `@`, `^`, `[`, `\`, `]`, `^` or `_`
#
# detach_keys = "ctrl-p,ctrl-q"
# Default OCI runtime
runtime = "runc"
# List of the OCI runtimes that support --format=json. When json is supported
# libpod will use it for reporting nicer errors.
runtime_supports_json = ["crun", "runc"]
# List of all the OCI runtimes that support --cgroup-manager=disable to disable
# creation of CGroups for containers.
runtime_supports_nocgroups = ["crun"]
# Paths to look for a valid OCI runtime (runc, runv, etc)
# If the paths are empty or no valid path was found, then the `$PATH`
# environment variable will be used as the fallback.
[runtimes]
runc = [
"/usr/bin/runc",
"/usr/sbin/runc",
"/usr/local/bin/runc",
"/usr/local/sbin/runc",
"/sbin/runc",
"/bin/runc",
"/usr/lib/cri-o-runc/sbin/runc",
"/run/current-system/sw/bin/runc",
]
crun = [
"/usr/bin/crun",
"/usr/sbin/crun",
"/usr/local/bin/crun",
"/usr/local/sbin/crun",
"/sbin/crun",
"/bin/crun",
"/run/current-system/sw/bin/crun",
]
nvidia = ["/usr/bin/nvidia-container-runtime"]
# Kata Containers is an OCI runtime, where containers are run inside lightweight
# Virtual Machines (VMs). Kata provides additional isolation towards the host,
# minimizing the host attack surface and mitigating the consequences of
# containers breakout.
# Please notes that Kata does not support rootless podman yet, but we can leave
# the paths below blank to let them be discovered by the $PATH environment
# variable.
# Kata Containers with the default configured VMM
kata-runtime = [
"/usr/bin/kata-runtime",
]
# Kata Containers with the QEMU VMM
kata-qemu = [
"/usr/bin/kata-qemu",
]
# Kata Containers with the Firecracker VMM
kata-fc = [
"/usr/bin/kata-fc",
]
# The [runtimes] table MUST be the last thing in this file.
# (Unless another table is added)
# TOML does not provide a way to end a table other than a further table being
# defined, so every key hereafter will be part of [runtimes] and not the main
# config.
$ cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
debug = "/tmp/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
#no-cgroups = false
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"
[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
debug = "/tmp/nvidia-container-runtime.log
$ cat /tmp/nvidia-container-runtime.log
2020/04/03 13:23:02 Running /usr/bin/nvidia-container-runtime
2020/04/03 13:23:02 Using bundle file: /home/andrews/.local/share/containers/storage/vfs-containers/614cb26f8f4719e3aba56be2e1a6dc29cd91ae760d9fe3bf83d6d1b24becc638/userdata/config.json
2020/04/03 13:23:02 prestart hook path: /usr/bin/nvidia-container-runtime-hook
2020/04/03 13:23:02 Prestart hook added, executing runc
2020/04/03 13:23:02 Looking for "docker-runc" binary
2020/04/03 13:23:02 "docker-runc" binary not found
2020/04/03 13:23:02 Looking for "runc" binary
2020/04/03 13:23:02 Runc path: /usr/bin/runc
2020/04/03 13:23:09 Running /usr/bin/nvidia-container-runtime
2020/04/03 13:23:09 Command is not "create", executing runc doing nothing
2020/04/03 13:23:09 Looking for "docker-runc" binary
2020/04/03 13:23:09 "docker-runc" binary not found
2020/04/03 13:23:09 Looking for "runc" binary
2020/04/03 13:23:09 ERROR: find runc path: exec: "runc": executable file not found in $PATH
2020/04/03 13:31:06 Running nvidia-container-runtime
2020/04/03 13:31:06 Command is not "create", executing runc doing nothing
2020/04/03 13:31:06 Looking for "docker-runc" binary
2020/04/03 13:31:06 "docker-runc" binary not found
2020/04/03 13:31:06 Looking for "runc" binary
2020/04/03 13:31:06 Runc path: /usr/bin/runc
$ nvidia-container-runtime --version
runc version 1.0.0-rc8
commit: 425e105d5a03fabd737a126ad93d62a9eeede87f
spec: 1.0.1-dev
NVRM version: 440.64.00
CUDA version: 10.2
Device Index: 0
Device Minor: 0
Model: GeForce RTX 2070
Brand: GeForce
GPU UUID: GPU-22dfd02e-a668-a6a6-a90a-39d6efe475ee
Bus Location: 00000000:01:00.0
Architecture: 7.5
$ docker version
Client:
Version: 18.09.7
API version: 1.39
Go version: go1.10.8
Git commit: 2d0083d
Built: Thu Jun 27 17:56:23 2019
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.8
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: afacb8b7f0
Built: Wed Mar 11 01:24:19 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683
See particularly step 4. #85 (comment)
This looks like the nvidia plugin is searching for a hard coded path to runc?
[updated] Hi @jamescassell , unfortunately do not work for me.
(same error using sudo
)
$ podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ --runtime=nvidia nvidia/cudanvidia-smi
2020/04/03 17:33:06 ERROR: /usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
2020/04/03 17:33:06 ERROR: /usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
Error: `/usr/bin/nvidia-container-runtime start 060398d97299ee033e8ebd698a11c128bd80ce641dd389976ca43a34b26abab3` failed: exit status 1
Hi @jamescassell , unfortunately do not work for me.
$ podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ nvidia/cuda nvidia-smi Error: container_linux.go:345: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": OCI runtime command not found error
Did you make the other changes described? I'd hit the same error until making the config changes.
@jamescassell yes, see #85 (comment)
Not sure if it's relevant but looks like you're missing a quote: debug = "/tmp/nvidia-container-runtime.log
@jamescassell
$ sudo nano /etc/nvidia-container-runtime/config.toml
I think this is a podman issue. Podman is not passing $PATH down to conmon when it executes it.
containers/podman#5712
I am not sure if conmon then passes the PATH environment down to the OCI runtime either.
@rhatdan yes , I will check this PR containers/podman#5712
Thanks
I had a major issue with this error message popping up when trying to change my container user id while adding the hook that was made to fix the rootless problem.
Error: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\\\n\\\"\"": OCI runtime error
But I've since learned that this particular behavior is quite quirky and where I thought I pinpointed it, it now seems to work, if there is a call to the container using sudo (the container wouldn't work but the subsequent command did). Eagerly awaiting an update where root (no pun intended) of this nvidia container problem gets addressed.
Hi @rhatdan , answering your previous question containers/podman#5712 (comment)
I was able to install the new version of podman, and it works fine with my GPU, however, I am getting this strange behavior at the end of the execution, please see:
andrews@deeplearning:~/Projects$ podman run -it --rm --runtime=nvidia --privileged nvidia/cuda:10.0-cudnn7-runtime nvidia-smi
Mon May 18 21:30:17 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 Off | 00000000:01:00.0 On | N/A |
| 37% 30C P8 9W / 175W | 166MiB / 7979MiB | 5% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
2020/05/18 23:30:18 ERROR: /usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
ERRO[0003] Error removing container 672a332467da4e91d8ac2fdc8f3c2973a808321341c2d80caa8d0ecad4f0db65: error removing container 672a332467da4e91d8ac2fdc8f3c2973a808321341c2d80caa8d0ecad4f0db65 from runtime: `/usr/bin/nvidia-container-runtime delete --force 672a332467da4e91d8ac2fdc8f3c2973a808321341c2d80caa8d0ecad4f0db65` failed: exit status 1
andrews@deeplearning:~$ podman --version
podman version 1.9.2
andrews@deeplearning:~$ cat /tmp/nvidia-container-runtime.log
2020/05/18 23:47:47 Running /usr/bin/nvidia-container-runtime
2020/05/18 23:47:47 Using bundle file: /home/andrews/.local/share/containers/storage/vfs-containers/3add1cc2bcb9cecde045877d9a0e4d3ed9f64d304cd5cb07fd0e072bf163a170/userdata/config.json
2020/05/18 23:47:47 prestart hook path: /usr/bin/nvidia-container-runtime-hook
2020/05/18 23:47:47 Prestart hook added, executing runc
2020/05/18 23:47:47 Looking for "docker-runc" binary
2020/05/18 23:47:47 Runc path: /usr/bin/docker-runc
2020/05/18 23:47:48 Running /usr/bin/nvidia-container-runtime
2020/05/18 23:47:48 Command is not "create", executing runc doing nothing
2020/05/18 23:47:48 Looking for "docker-runc" binary
2020/05/18 23:47:48 Runc path: /usr/bin/docker-runc
2020/05/18 23:47:48 Running /usr/bin/nvidia-container-runtime
2020/05/18 23:47:48 Command is not "create", executing runc doing nothing
2020/05/18 23:47:48 Looking for "docker-runc" binary
2020/05/18 23:47:48 "docker-runc" binary not found
2020/05/18 23:47:48 Looking for "runc" binary
2020/05/18 23:47:48 ERROR: find runc path: exec: "runc": executable file not found in $PATH
andrews@deeplearning:~$ nvidia-container-runtime --version
runc version 1.0.0-rc10
commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
spec: 1.0.1-dev
andrews@deeplearning:~$ whereis runc
runc: /usr/bin/runc
andrews@deeplearning:~$ whereis docker-runc
docker-runc: /usr/bin/docker-runc
do you know what it can be?
The error you are getting looks like the $PATH was not being passed into you OCI Runtime?
Yes, it's strange...
- Modify
/etc/nvidia-container-runtime/config.toml
and change these values:
...- run it rootless as
podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ nvidia/cuda:10.2-devel-ubi8 /usr/bin/nvidia-smi
This did the trick for me, thanks. I'm pondering the user/process isolation ramifications of these changes on a multi-user system. Hopefully, RH/NVDA can get this as elegant as Docker's --gpus=all
without significantly degrading the security benefits of rootless podman over docker...
If you leave the SELinux enabled, what AVC's are you seeing?
Amazing work! I was able to get to run GPU enabled containers on Fedora 32 using centos8 repos, and only modifying the /etc/nvidia-container-runtime/config.toml
changing no-cgroups = true
. I was wondering what are the implications of not using the hooks-dir ?
Thanks
Update: Checking a tensorflow image, works flawlessly:
Podman rootless with version 1.9.3
For anyone who is looking to have rootless "nvidia-docker" be more or less seamless with podman I would suggest the following changes:
$ cat ~/.config/containers/libpod.conf
hooks_dir = ["/usr/share/containers/oci/hooks.d", "/etc/containers/oci/hooks.d"]
label = false
$ grep no-cgroups /etc/nvidia-container-runtime/config.toml
no-cgroups = true
After the above changes on Fedora 32 I can run nvidia-smi
using just:
$ podman run -it --rm nvidia/cuda:10.2-base nvidia-smi
Fri Jun 26 22:49:50 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN RTX Off | 00000000:08:00.0 On | N/A |
| 41% 35C P8 5W / 280W | 599MiB / 24186MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
The only annoyance is needing to edit /etc/nvidia-container-runtime/config.toml
whenever there is a package update for nvidia-container-toolkit
, which fortunately doesn't happen too often. If there was someway to make changes to config.toml
persistent across updates or an user config file (without using some hack like chattr +i
) then this process would be really smooth.
Maybe in the future a more targeted approach for disabling SELinux will come along that is more secure than just disabling labeling completely for lazy people like myself. I only run a few GPU-based containers here and there so I'm personally not too concerned.
@zeroepoch You can add an SELinux policy, see here: https://github.com/mjlbach/podman_ml_containers/blob/master/selinux.sh
The instructions here worked for me on Fedora 32, however the problem reappears if I specify --userns keep-id
:
Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime error
Is that expected behaviour?
The instructions here worked for me on Fedora 32, however the problem reappears if I specify
--userns keep-id
:Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime error
Is that expected behaviour?
Make sure you have modified the file at: /etc/nvidia-container-runtime/config.toml
Everytime that the nvidia-container is updated it will reset the default values and you should change the values of:
#no-cgroups=false
no-cgroups = true
@Davidnet Even after the above modification, I am able to reproduce @invexed's error if I try to run the cuda-11 containers. Note the latest tag currently points to cuda 11.
$ podman run --rm --security-opt=label=disable nvidia/cuda:11.0-base-rc /usr/bin/nvidia-smi
Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime erro
But not when trying to run a cuda 10.2 container or lower
$ podman run --rm --security-opt=label=disable nvidia/cuda:10.2-base /usr/bin/nvidia-smi
Sun Jul 12 15:57:40 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 00000000:01:00.0 On | N/A |
| 0% 60C P0 37W / 230W | 399MiB / 8116MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Make sure you have modified the file at: /etc/nvidia-container-runtime/config.toml
Thanks for the reply. I have indeed modified this file. The container runs with podman run --rm --security-opt label=disable -u 0:0 container
, but podman run --rm --security-opt label=disable --userns keep-id -u $(id -u):$(id -g) container
results in the above error.
EDIT: I have CUDA 10.2 installed:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 960M Off | 00000000:01:00.0 Off | N/A |
| N/A 33C P8 N/A / N/A | 42MiB / 2004MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1565 G /usr/libexec/Xorg 20MiB |
| 0 2013 G /usr/libexec/Xorg 20MiB |
+-----------------------------------------------------------------------------+
EDIT: I have CUDA 10.2 installed:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 960M Off | 00000000:01:00.0 Off | N/A | | N/A 33C P8 N/A / N/A | 42MiB / 2004MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1565 G /usr/libexec/Xorg 20MiB | | 0 2013 G /usr/libexec/Xorg 20MiB | +-----------------------------------------------------------------------------+
You need a 450 driver to run CUDA 11.0 containers. The host CUDA version (or even none at all) doesn't matter, but the driver version does when running a CUDA container. nvidia-docker
makes this error more obvious compared to podman
. After updating your driver you should be able to run the container.
You need a 450 driver to run CUDA 11.0 containers. The host CUDA version (or even none at all) doesn't matter, but the driver version does when running a CUDA container.
nvidia-docker
makes this error more obvious compared topodman
. After updating your driver you should be able to run the container.
Apologies for the confusion, but I'm actually trying to run a CUDA 10.0.130 container. Updating the driver may fix @mjlbach's problem though.
To be more precise, I'm installing CUDA via https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux within an image based on archlinux
.
podman run --rm --security-opt label=disable -u $(id -u):$(id -g) --userns keep-id container
triggers Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime error
, but
podman run --rm --security-opt label=disable -u 0:0 container
does not. The problem seems to be related to the specification of --userns keep-id
.
You can add an SELinux policy, see here: https://github.com/mjlbach/podman_ml_containers/blob/master/selinux.sh
Interesting, per the link in that script to the DGX project, looks like nVidia has already solved SELinux woes on EL7 with nvidia-container
. There are plenty of warnings in that project about how it has only been tested on DGX running EL7, would be great if nVidia made this policy available for general use with EL7/EL8 and bundled it inside the nvidia-container-runtime
package(s).
That should allow us to use rootless podman with GPU acceleration without --security-opt label=disable
, but I don't know the security implications of said policy...
UPDATE: Requested that the DGX selinux update be made part of this package in Issue NVIDIA/nvidia-docker#121
Hi. Folks, I've hit this same wall as other person: NVIDIA/nvidia-container-toolkit#182. Any idea why that would happen?
@zeroepoch You can add an SELinux policy, see here: https://github.com/mjlbach/podman_ml_containers/blob/master/selinux.sh
I finally got around to trying this SELinux module and it worked. I need to add --security-opt label=type:nvidia_container_t
still, but that should be more secure than disabling labels. What prompted this attempt to try again was that libpod.conf
was deprecated and I was converting my settings to ~/.config/containers/containers.conf
. I don't need anything in there anymore with this additional option. Now I just need to figure out how to make it default since I pretty much just run nvidia GPU containers.
For anyone who wants to disable labels still to make the CLI simpler, here are the contents of containers.conf
above:
[containers]
label = false
I don't know if this is the right place to ask, and I can open a separate issue if needed.
I'm testing rootless Podman v3.0 with crun v0.17 on our Summit test systems at Oak Ridge (IBM Power 9 with Nvidia Tesla V100 GPUs, RHEL 8.2). We have a restriction that we can't setup and maintain the subuid/subgid mappings for each of our users in the /etc/sub[uid|gid] files. That would be a giant administrative overhead since that mapping would have to be maintained on every node. Currently pulling or building cuda containers works just fine. But when trying to run it.
% podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ oci-archive:/ccs/home/subil/subil-containers-oci/simplecuda nvidia-smi
Getting image source signatures
Copying blob 5ef3c0b978d0 done
Copying blob d23be3dac067 done
Copying blob 786d8ed1601c done
Copying blob 6e99435589e0 done
Copying blob 93d25f6f9464 done
Copying blob d1ababb2c734 done
Copying config beba83a3b2 done
Writing manifest to image destination
Storing signatures
Error: OCI runtime error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1)
Here, simplecuda is just an oci-archive of docker.io/nvidia/cuda-ppc64le:10.2-base-centos7 (our HPC system uses IBM PowerPC).
The nvidia-container-toolkit.log looks like this
-- WARNING, the following logs are for debugging purposes only --
I0330 21:24:39.001988 1186667 nvc.c:282] initializing library context (version=1.3.0, build=16315ebdf4b9728e899f615e208b50c41d7a5d15)
I0330 21:24:39.002033 1186667 nvc.c:256] using root /
I0330 21:24:39.002038 1186667 nvc.c:257] using ldcache /etc/ld.so.cache
I0330 21:24:39.002043 1186667 nvc.c:258] using unprivileged user 65534:65534
I0330 21:24:39.002058 1186667 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0330 21:24:39.002241 1186667 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W0330 21:24:39.002259 1186667 nvc.c:167] skipping kernel modules load due to user namespace
I0330 21:24:39.002400 1186672 driver.c:101] starting driver service
E0330 21:24:39.002442 1186672 driver.c:161] could not start driver service: privilege change failed: operation not permitted
I0330 21:24:39.003214 1186667 driver.c:196] driver service terminated successfully
I've tried a whole variety of different Podman flag combinations mentioned earlier in this issue thread. None have worked. They all have the same errors above in the output and the log file.
I have the hook json file properly set up
% cat /usr/share/containers/oci/hooks.d/oci-nvidia-hook.json
{
"version": "1.0.0",
"hook": {
"path": "/usr/bin/nvidia-container-toolkit",
"args": ["nvidia-container-toolkit", "prestart"],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
]
},
"when": {
"always": true,
"commands": [".*"]
},
"stages": ["prestart"]
}
The nvidia-container-runtime config.toml looks like this
[76a@raptor07 ~]$ cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
debug = "/tmp/.nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
debug = "/tmp/.nvidia-container-runtime.log"
My storage.conf looks like this
% cat ~/.config/containers/storage.conf
[storage]
driver = "overlay"
graphroot = "/tmp/subil-containers-peak"
rootless_storage_path = "$HOME/.local/share/containers/storage"
#rootless_storage_path = "/tmp/subil-containers-storage-peak"
[storage.options]
additionalimagestores = [
]
[storage.options.overlay]
ignore_chown_errors = "true"
mount_program = "/usr/bin/fuse-overlayfs"
mountopt = "nodev,metacopy=on"
[storage.options.thinpool]
For comparison, I also tested this on a PowerPC workstation (identical to the HPC nodes: IBM Power9 with Nvidia Tesla V100, RHEL 8.2) and it's the exact same errors there too. But once we set up the subuid/subgid mappings on the workstation and did echo โuser.max_user_namespaces=28633โ > /etc/sysctl.d/userns.conf
, Podman was able to run the cuda container without issue.
[76a@raptor07 gpu]$ podman run --rm docker.io/nvidia/cuda-ppc64le:10.2-base-centos7 nvidia-smi -L
GPU 0: Tesla V100-PCIE-16GB (UUID: GPU-4d2aad84-ad3d-430b-998c-6124d28d8e7c)
So I know the issue is that we need both the subuid/subgid mappings and the user.max_user_namespaces
. I want to know if it is possible to get the nvidia-container-toolkit working with rootless Podman without needing the subuid/subgid mappings.
For reference, we had a related issue (containers/podman#8580) with MPI not working because of the lack of subuid/subgid mappings. @giuseppe was able to patch crun and Podman to make that work for Podman v3 and crun >=v0.17. I wanted to know if there was something that could be done here to make the nvidia-container-toolkit also work under the same conditions.
I'm happy to provide more details if you need.
I have posted this here but it seems this issue is more relavent and is still open, so I copy it here.
I encountered exactly the same problem with podman 3.0.1 and nvidia-container-runtime 3.4.0-1
/usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
After some attempts, I find out that --cap-add AUDIT_WRITE
solves this problem.
I have totally no idea why this would even work, though.
Here's my podman info
, I'm happy to offer any further detailed info if asked.
host:
arch: amd64
buildahVersion: 1.19.4
cgroupManager: cgroupfs
cgroupVersion: v1
conmon:
package: /usr/bin/conmon ็ฑ conmon 1:2.0.27-1 ๆๆฅๆ
path: /usr/bin/conmon
version: 'conmon version 2.0.27, commit: 65fad4bfcb250df0435ea668017e643e7f462155'
cpus: 16
distribution:
distribution: manjaro
version: unknown
eventLogger: journald
hostname: manjaro
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 5.9.16-1-MANJARO
linkmode: dynamic
memFree: 26319368192
memTotal: 33602633728
ociRuntime:
name: /usr/bin/nvidia-container-runtime
package: /usr/bin/nvidia-container-runtime ็ฑ nvidia-container-runtime-bin 3.4.0-1 ๆๆฅๆ
path: /usr/bin/nvidia-container-runtime
version: |-
runc version 1.0.0-rc93
commit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
spec: 1.0.2-dev
go: go1.16.2
libseccomp: 2.5.1
os: linux
remoteSocket:
path: /run/user/1000/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
selinuxEnabled: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: /usr/bin/slirp4netns ็ฑ slirp4netns 1.1.9-1 ๆๆฅๆ
version: |-
slirp4netns version 1.1.9
commit: 4e37ea557562e0d7a64dc636eff156f64927335e
libslirp: 4.4.0
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.1
swapFree: 0
swapTotal: 0
uptime: 1h 50m 44.99s (Approximately 0.04 days)
registries:
docker.io:
Blocked: false
Insecure: false
Location: hub-mirror.c.163.com
MirrorByDigestOnly: false
Mirrors: null
Prefix: docker.io
search:
- docker.io
store:
configFile: /home/wangyize/.config/containers/storage.conf
containerStore:
number: 30
paused: 0
running: 1
stopped: 29
graphDriverName: overlay
graphOptions:
overlay.mount_program:
Executable: /usr/bin/fuse-overlayfs
Package: /usr/bin/fuse-overlayfs ็ฑ fuse-overlayfs 1.5.0-1 ๆๆฅๆ
Version: |-
fusermount3 version: 3.10.2
fuse-overlayfs: version 1.5
FUSE library version 3.10.2
using FUSE kernel interface version 7.31
graphRoot: /home/wangyize/.local/share/containers/storage
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "false"
imageStore:
number: 2
runRoot: /run/user/1000/containers
volumePath: /home/wangyize/.local/share/containers/storage/volumes
version:
APIVersion: 3.0.0
Built: 1613921386
BuiltTime: Sun Feb 21 23:29:46 2021
GitCommit: c640670e85c4aaaff92741691d6a854a90229d8d
GoVersion: go1.16
OsArch: linux/amd64
Version: 3.0.1
Does anyone have any idea what would require the AUDIT_WRITE capability?
AUDIT_WRITE
is a capability I'd rather not add... Looks like runc has it by default?
In the OCI/runc spec they are even more drastic only retaining, audit_write, kill, and net_bind_service
Looking at the error message, the nvidia-container-runtime
(a simple shim for runc) is failing to find runc
. This is implemented here: https://github.com/NVIDIA/nvidia-container-runtime/blob/v3.4.2/src/main.go#L96 and is due to the result of exec.LookPath
failing. Internally, that is checking whether ${P}/runc
exists, is not a directory, and is executable for each ${P}
in the ${PATH}
. This calls os.Stat
and I would assume that this query would trigger an entry into the audit log.
Do you have any audit logs to confirm that this is what is causing this?
Note: at this point, no container has been created or started as the runc create
command has just been intercepted and the OCI spec patched to insert the NVIDIA hook.
the error looks like an outdated runc that doesn't understand errnoRet
: opencontainers/runc#2424
Without support for errnoRet
, runc is not able to handle: https://github.com/containers/common/blob/master/pkg/seccomp/seccomp.json#L730-L833 and the only way to disable this chunk is to add CAP_AUDIT_WRITE
.
I'd try with an updated runc first and see if it can handle the seccomp configuration generated by Podman when CAP_AUDIT_WRITE
is not added
Following on my previous comment: #85 (comment)
I tested out running different versions (v1.2.0, v1.3.0 and the latest v1.3.3) of nvidia-container-toolkit and libnvidia-container for rootless Podman without subuid/subgid on x86 machines as well, with identical settings and configs as I had in the PowerPC machines. The tests on x86 show the exact same issues for rootless Podman as they did on PowerPC.
@secondspass thanks for confirming that you're able to reproduce on an x86 machine. Do you have a simple way for us to reproduce this internally? This would allow us to better assess what the requirements are for getting this working.
We are in the process of reworking how the NVIDIA Container Stack works and this should address these kinds of issues, as we would make more use of the low-level runtime (crun
in this case).
Do you have a simple way for us to reproduce this internally?
While @secondspass reported bug was with CentOS 8.3, I can report that it exists in x86-64 CentOS8 Streams as well. Here is how to reproduce in centos8-streams (which has an updated podman/crun stack):
- Verify nvidia cuda repos and nvidia-container-toolkit repos are enabled
- Deploy nvidia proprietary drivers:
# dnf module install nvidia-driver:465-dkms
, reboot and verifynvidia-smi
works - Deploy podman/crun stack:
# dnf install crun podman skopeo buildah slirp4netns
- Enable use of containers without the need for subuid/subgid (per @secondspass ):
cat ~/.config/containers/storage.conf
[storage]
driver = "overlay"
graphroot = "/tmp/${USER}-containers-peak"
rootless_storage_path = "${HOME}/.local/share/containers/storage"
[storage.options]
additionalimagestores = [
]
[storage.options.overlay]
ignore_chown_errors = "true"
mount_program = "/usr/bin/fuse-overlayfs"
mountopt = "nodev,metacopy=on"
[storage.options.thinpool]
- Verify current user's subuid/subgid is not set since they get automatically added if one users certain CL tools to add users:
$ grep $USER /etc/subuid | wc -l
0
$ grep $USER /etc/subgid | wc -l
0
- Verify rootless containers work (without GPU acceleration):
$ podman run --rm docker.io/centos:8 cat /etc/redhat-release
CentOS Linux release 8.3.2011
- Deploy libnvidia-container-tools:
# dnf install nvidia-container-toolkit
- Modify configuration to support podman / rootless (per @secondspass and others above ):
cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
debug = "/tmp/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig"
[nvidia-container-runtime]
debug = "/tmp/nvidia-container-runtime.log"
- Test rootless podman with gpu acceleration and no subuid/subgid, it fails:
$ podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ docker.io/nvidia/cuda:10.2-base-centos8 nvidia-smi -L
...
Error: OCI runtime error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1)
$ cat /tmp/nvidia-container-toolkit.log
-- WARNING, the following logs are for debugging purposes only --
I0421 13:52:26.487793 6728 nvc.c:372] initializing library context (version=1.3.3, build=bd9fc3f2b642345301cb2e23de07ec5386232317)
I0421 13:52:26.487987 6728 nvc.c:346] using root /
I0421 13:52:26.488002 6728 nvc.c:347] using ldcache /etc/ld.so.cache
I0421 13:52:26.488013 6728 nvc.c:348] using unprivileged user 65534:65534
I0421 13:52:26.488067 6728 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0421 13:52:26.488264 6728 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment
W0421 13:52:26.488328 6728 nvc.c:249] skipping kernel modules load due to user namespace
I0421 13:52:26.488877 6733 driver.c:101] starting driver service
E0421 13:52:26.489031 6733 driver.c:161] could not start driver service: privilege change failed: operation not permitted
I0421 13:52:26.498449 6728 driver.c:196] driver service terminated successfully
- (sanity check) verify it works WITH sudo:
$ sudo podman run --rm --security-opt=label=disable --hooks-dir=/usr/share/containers/oci/hooks.d/ docker.io/nvidia/cuda:10.2-base-centos8 nvidia-smi -L
...
GPU 0: NVIDIA Tesla V100-PCIE-32GB (UUID: GPU-0a55d110-f8ea-4209-baa7-0e5675c7e832)
Version info for my run:
$ cat /etc/redhat-release
CentOS Stream release 8
$ nvidia-smi | grep Version
NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3
$ nvidia-container-cli --version
version: 1.3.3
$ crun --version
crun version 0.18
$ podman --version
podman version 3.1.0-dev
Update: Spun this issue off into its own issue
I have posted this here but it seems this issue is more relavent and is still open, so I copy it here.
I encountered exactly the same problem with podman 3.0.1 and nvidia-container-runtime 3.4.0-1/usr/bin/nvidia-container-runtime: find runc path: exec: "runc": executable file not found in $PATH
After some attempts, I find out that
--cap-add AUDIT_WRITE
solves this problem.I have totally no idea why this would even work, though.
Here's mypodman info
, I'm happy to offer any further detailed info if asked.host: arch: amd64 buildahVersion: 1.19.4 cgroupManager: cgroupfs cgroupVersion: v1 conmon: package: /usr/bin/conmon ็ฑ conmon 1:2.0.27-1 ๆๆฅๆ path: /usr/bin/conmon version: 'conmon version 2.0.27, commit: 65fad4bfcb250df0435ea668017e643e7f462155' cpus: 16 distribution: distribution: manjaro version: unknown eventLogger: journald hostname: manjaro idMappings: gidmap: - container_id: 0 host_id: 1000 size: 1 - container_id: 1 host_id: 100000 size: 65536 uidmap: - container_id: 0 host_id: 1000 size: 1 - container_id: 1 host_id: 100000 size: 65536 kernel: 5.9.16-1-MANJARO linkmode: dynamic memFree: 26319368192 memTotal: 33602633728 ociRuntime: name: /usr/bin/nvidia-container-runtime package: /usr/bin/nvidia-container-runtime ็ฑ nvidia-container-runtime-bin 3.4.0-1 ๆๆฅๆ path: /usr/bin/nvidia-container-runtime version: |- runc version 1.0.0-rc93 commit: 12644e614e25b05da6fd08a38ffa0cfe1903fdec spec: 1.0.2-dev go: go1.16.2 libseccomp: 2.5.1 os: linux remoteSocket: path: /run/user/1000/podman/podman.sock security: apparmorEnabled: false capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT rootless: true seccompEnabled: true selinuxEnabled: false slirp4netns: executable: /usr/bin/slirp4netns package: /usr/bin/slirp4netns ็ฑ slirp4netns 1.1.9-1 ๆๆฅๆ version: |- slirp4netns version 1.1.9 commit: 4e37ea557562e0d7a64dc636eff156f64927335e libslirp: 4.4.0 SLIRP_CONFIG_VERSION_MAX: 3 libseccomp: 2.5.1 swapFree: 0 swapTotal: 0 uptime: 1h 50m 44.99s (Approximately 0.04 days) registries: docker.io: Blocked: false Insecure: false Location: hub-mirror.c.163.com MirrorByDigestOnly: false Mirrors: null Prefix: docker.io search: - docker.io store: configFile: /home/wangyize/.config/containers/storage.conf containerStore: number: 30 paused: 0 running: 1 stopped: 29 graphDriverName: overlay graphOptions: overlay.mount_program: Executable: /usr/bin/fuse-overlayfs Package: /usr/bin/fuse-overlayfs ็ฑ fuse-overlayfs 1.5.0-1 ๆๆฅๆ Version: |- fusermount3 version: 3.10.2 fuse-overlayfs: version 1.5 FUSE library version 3.10.2 using FUSE kernel interface version 7.31 graphRoot: /home/wangyize/.local/share/containers/storage graphStatus: Backing Filesystem: extfs Native Overlay Diff: "false" Supports d_type: "true" Using metacopy: "false" imageStore: number: 2 runRoot: /run/user/1000/containers volumePath: /home/wangyize/.local/share/containers/storage/volumes version: APIVersion: 3.0.0 Built: 1613921386 BuiltTime: Sun Feb 21 23:29:46 2021 GitCommit: c640670e85c4aaaff92741691d6a854a90229d8d GoVersion: go1.16 OsArch: linux/amd64 Version: 3.0.1
The fact that this works and solved this problem for me as well, tells me this is a race condition.
I am a bit confused the current state of podman rootless with gpus.
I am on ubuntu 18.04 arm64 host.
I have made the changes to:
disable-require = false
[nvidia-container-cli]
environment = []
debug = "/tmp/nvidia-container-toolkit.log"
load-kmods = true
no-cgroups = true
ldconfig = "@/sbin/ldconfig.real"
[nvidia-container-runtime]
debug = "/tmp/nvidia-container-runtime.log"
- Is the change above only required on machines that are using cgroups v2 ?
I am only able to get GPU access if I run podman with sudo
and --privileged
(I need both: Update: See comment below). So far have found no other way to run podman with GPU access, even with the above cgroups change, my root does not break.
- What does this mean if my root is not breaking with the cgroup change?
When I run rootless, I see the following error:
Error: OCI runtime error: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods --debug=/dev/stderr configure --ldconfig=@/sbin/ldconfig.real --device=all --utility --pid=20052 /data/gpu/rootfs]\\\\n\\\\n-- WARNING, the following logs are for debugging purposes only --\\\\n\\\\nI0427 00:54:00.184026 20064 nvc.c:281] initializing library context (version=0.9.0+beta1, build=77c1cbc2f6595c59beda3699ebb9d49a0a8af426)\\\\nI0427 00:54:00.184272 20064 nvc.c:255] using root /\\\\nI0427 00:54:00.184301 20064 nvc.c:256] using ldcache /etc/ld.so.cache\\\\nI0427 00:54:00.184324 20064 nvc.c:257] using unprivileged user 65534:65534\\\\nI0427 00:54:00.184850 20069 driver.c:134] starting driver service\\\\nI0427 00:54:00.192642 20064 driver.c:231] driver service terminated with signal 15\\\\nnvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected\\\\n\\\"\""
I have tried with --security-opt=label=disable
and have seen no changes in behavior.
- It is unclear to me what runtime people are using. Are they using standard
runc
or/usr/bin/nvidia-container-runtime
I have tried both, and both do not work in rootless, and both work in root with privileged.
does it make any difference if you bind mount /dev
from the host?
security-opt=label=disable
I'm not very fluent in Ubuntu IT, but I believe that command targets SELinux. Ubuntu uses Apparmor for mandatory access control (MAC). So wouldn't the equivalent command be --security-opt 'apparmor=unconfined'
?
First a slight update and correction to the above, I don't actually need --privileged
I just need to define -e NVIDIA_VISIBLE_DEVICES=all
and this invokes the nvidia hook, which works with sudo.
Rootless is not fixed still.
does it make any difference if you bind mount
/dev
from the host?
@giuseppe It does not fix rootless, but in rootful podman using sudo this makes the hook no longer required, which makes sense.
The issue with rootless is that I can't mount all of /dev/
Error: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"open /dev/console: permission denied\"": OCI permission denied
So I did the next best thing and attempted to mount all of the nv* devices under /dev/.
I tried 2 ways one with -v
and the other using the --device
flag and adding in the nvidia components.
That does not allow the rootless container to detect the GPUS still!
It does work using rootful podman using sudo
.
The difference is when it is mapped with sudo, I actually see the devices belong to the root:video
.
Whereas in rootless mode, I only see nobody:nogroup
I am wondering if it is related to the video
group? The error I get in rootless mode is the following when trying to run CUDA code:
Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading == false: CUDA: no CUDA-capable device is detected
When I look in the container under /dev in the rootless container:
$ls -la /dev
...
crw-rw---- 1 nobody nogroup 505, 1 Apr 26 18:32 nvhost-as-gpu
...
For ALL of the nv* devices in the rootless container, they don't have a user/group mapped.
In the rootful container that uses sudo:
...
crw-rw---- 1 root video 505, 1 Apr 26 18:32 nvhost-as-gpu
...
For ALL of the nv* devices in the rootful container that uses sudo, they have root:video
So I am pretty certain I need video
mapped into the container. But am unclear on how to do this.
I have mapped in the video
group with --group-add
as a test, but I believe I also need to use --gidmap
because even with group-add
it still shows as nogroup
.
My understanding of the user/group mapping podman does is a little fuzzy so I will take suggestions on how to do this ๐
Let me know what you think @giuseppe
security-opt=label=disable
I'm not very fluent in Ubuntu IT, but I believe that command targets SELinux. Ubuntu uses Apparmor for mandatory access control (MAC). So wouldn't the equivalent command be
--security-opt 'apparmor=unconfined'
?
@qhaas Excellent point. That explains why that flag seems to be a no-op for me.
Also my current system, at least right now does not have apparmor loaded so I shouldn't need either of those flags.
I tired it though just for sanity, and confirmed no difference in behavior.
Thank you!
If you have any suggestions on gid mappings please let me know!
First a slight update and correction to the above, I don't actually need
--privileged
I just need to define-e NVIDIA_VISIBLE_DEVICES=all
and this invokes the nvidia hook, which works with sudo.
Rootless is not fixed still.does it make any difference if you bind mount
/dev
from the host?@giuseppe It does not fix rootless, but in rootful podman using sudo this makes the hook no longer required, which makes sense.
The issue with rootless is that I can't mount all of
/dev/
could you use -v /dev:/dev --mount type=devpts,destination=/dev/pts
?
@giuseppe I tried adding: -v /dev:/dev --mount type=devpts,destination=/dev/pts
And got the following error:
DEBU[0004] ExitCode msg: "container create failed (no logs from conmon): eof"
Error: container create failed (no logs from conmon): EOF
Not sure how to enable more logs in conmon
If I switch to use: --runtime /usr/local/bin/crun
with -v /dev:/dev --mount type=devpts,destination=/dev/pts
I get the following error:
Error: OCI runtime error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1)
From previous encounters with this error, the way I understand this message is that the video
group that is apart of the mount location, is not being mapped into the container correctly.
Just an FYI, I was able to get rootless podman to access the GPU if I added my user to the video
group and used the runtime crun
. More details here: containers/podman#10166
I am still interested in a path forward without adding my user to the video
group, but this is a good progress step.
Just as an update that have been posted in the containers/podman#10166
I have been able to access my GPU as a rootless user that belongs to the video
group, using the nvidia hook:
cat /data/01-nvhook.json
{
"version": "1.0.0",
"hook": {
"path": "/usr/bin/nvidia-container-toolkit",
"args": ["nvidia-container-toolkit", "prestart"],
"env": ["NVIDIA_REQUIRE_CUDA=cuda>=10.1"]
},
"when": {
"always": true
},
"stages": ["prestart"]
}
But also this one seems to work as well:
cat /data/01-nvhook-runtime-hook.json
{
"version": "1.0.0",
"hook": {
"path": "/usr/bin/nvidia-container-runtime-hook",
"args": ["/usr/bin/nvidia-container-runtime-hook", "prestart"],
"env": []
},
"when": {
"always": true
},
"stages": ["prestart"]
}
Separately, without hooks I was able to use the --device
mounts and access my GPU as well.
The important steps that had to be taken here was:
- Rootless user needs to belong to the
video
group. - Use Podman flags
--group-add keep-groups
(This correctly maps thevideo
group into the container. ) - Use
crun
and notrunc
becausecrun
is the only runtime that supports--group-add keep-groups
I have a related issue here : containers/podman#10212 to get this working in C++ with execv
and am seeing an odd issue.
Hi,
I've been using containers with acess to GPUs, however i've been noted that for each reboot i need to run allways before starting the 1st container:
nvidia-smi
otherwise i get the error:
Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime error
After that i also need to run the NVIDIA Device Node Verification script to proper startup the /dev/nvidia-uvm for CUDA applications as described in this post:
tensorflow/tensorflow#32623 (comment)
Just to share my HW configuration that works (only with --privileged tag) on root and rootless:
NAME="CentOS Linux"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
CentOS Linux release 8.3.2011
CentOS Linux release 8.3.2011
getenforce:
Enforcing
podman info:
arch: amd64
buildahVersion: 1.20.1
cgroupManager: cgroupfs
cgroupVersion: v1
conmon:
package: conmon-2.0.27-1.el8.1.5.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.27, commit: '
cpus: 80
distribution:
distribution: '"centos"'
version: "8"
eventLogger: journald
hostname: turing
idMappings:
gidmap:
- container_id: 0
host_id: 2002
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 2002
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 4.18.0-240.22.1.el8_3.x86_64
linkmode: dynamic
memFree: 781801324544
memTotal: 809933586432
ociRuntime:
name: crun
package: crun-0.19.1-2.el8.3.1.x86_64
path: /usr/bin/crun
version: |-
crun version 0.19.1
commit: 1535fedf0b83fb898d449f9680000f729ba719f5
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
os: linux
remoteSocket:
path: /run/user/2002/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
selinuxEnabled: true
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.1.8-4.el8.7.6.x86_64
version: |-
slirp4netns version 1.1.8
commit: d361001f495417b880f20329121e3aa431a8f90f
libslirp: 4.3.1
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.4.3
swapFree: 42949668864
swapTotal: 42949668864
uptime: 29h 16m 48.14s (Approximately 1.21 days)
registries:
search:
- docker.io
- quay.io
store:
configFile: /home/user/.config/containers/storage.conf
containerStore:
number: 29
paused: 0
running: 0
stopped: 29
graphDriverName: overlay
graphOptions:
overlay.mount_program:
Executable: /usr/bin/fuse-overlayfs
Package: fuse-overlayfs-1.5.0-1.el8.5.3.x86_64
Version: |-
fusermount3 version: 3.2.1
fuse-overlayfs: version 1.5
FUSE library version 3.2.1
using FUSE kernel interface version 7.26
graphRoot: /home/user/.local/share/containers/storage
graphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "false"
imageStore:
number: 28
runRoot: /run/user/2002/containers
volumePath: /home/user/.local/share/containers/storage/volumes
version:
APIVersion: 3.1.2
Built: 1619185402
BuiltTime: Fri Apr 23 14:43:22 2021
GitCommit: ""
GoVersion: go1.14.12
OsArch: linux/amd64
Version: 3.1.2
nvidia-smi | grep Version
NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3
cat /etc/nvidia-container-runtime/config.toml
disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig"
Hi @Ru13en the issue you described above seems to be a different one to what is being discussed here. Would you mind moving this to a separate GitHub issue? (I would assume this is because the nvidia container toolkit cannot load the kernel modules if it does not have the required permissions. Running nvidia-smi
loads the kernel modules and also ensures that the device nodes are created).
For anybody who has the same issue as me ("nvidia-smi": executable file not found in $PATH: OCI not found
or no NVIDIA GPU device is present: /dev/nvidia0 does not exist, this is how I made it work on kubuntu 21.04 rootless:
Add your user to group video if not present:
usermod -a -G video $USER
/usr/share/containers/oci/hooks.d/oci-nvidia-hook.json
:
{
"version": "1.0.0",
"hook": {
"path": "/usr/bin/nvidia-container-runtime-hook",
"args": ["/usr/bin/nvidia-container-runtime-hook", "prestart"],
"env": []
},
"when": {
"always": true
},
"stages": ["prestart"]
}
/etc/nvidia-container-runtime/config.toml
:
disable-require = false
[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-runtime-hook.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig.real"
podman run -it --group-add video docker.io/tensorflow/tensorflow:latest-gpu-jupyter nvidia-smi
Sun Jul 18 11:45:06 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.31 Driver Version: 465.31 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:09:00.0 On | N/A |
| 31% 43C P8 6W / 215W | 2582MiB / 7979MiB | 9% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
@rhatdan @nvjmayo Turns out that getting rootless podman working with nvidia on centos 7 is a bit more complicated, at least for us.
Here is our scenario on brand new centos 7.7 machine
- run nvidia-smi with rootless podman
1.result: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 0 caused "error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\n""- run podman with user=root
2.result: nvidia-smi works- run podman rootless
3.result: nvidia-smi works!- reboot machine, run podman rootless
4.result: fails again with same error as Plugin requirements #1Conclusion: running nvidia container with podman as root changes the environment for rootless to work. Environment cleared on reboot.
One other comment: podman as root and rootless podman cannot run with the same /etc/nvidia-container-runtime/config.toml - no-cgroups must =false for root and =true for rootless
Hi, have you figure out the solution?
I have exactly the same symptom as yours.
Rootless running only works after launching container as root at least once. And reboot reset everything.
I am using RHEL 8.4 and can't believe this still happens after one year ...
For those dropping into this issue, nvidia has documented getting GPU acceleration working with podman.
For those dropping into this issue, nvidia has documented getting GPU acceleration working with podman.
That's awesome! The documentation is almost the same as my fix here in this thread :D
any chance they can update the version of podman in example. That one is pretty old.
@fuomag9 Are you using crun
as opposed to runc
out of curiosity?
Does it work with both in rootless for you? Or just crun
?
@fuomag9 Are you using
crun
as opposed torunc
out of curiosity? Does it work with both in rootless for you? Or justcrun
?
Working for me with both runc
and crun
set via /etc/containers/containers.conf
with runtime = "XXX"
--hooks-dir /usr/share/containers/oci/hooks.d/
does not seem to be needed anymore, at least with podman 3.3.1 and nvidia-container-toolkit 1.7.0.
For RHEL8 systems where selinux is enforcing
, it it 'best practice' to add the nvidia selinux policy module and run podman with --security-opt label=type:nvidia_container_t
(per RH documentation, even on non-DGX systems) or just run podman with --security-opt=label=disable
(per nvidia documentation)? Unclear if there is any significant benefit to warrant messing with SELinux policy.
For folks finding this issue, especially anyone trying to do this on RHEL8 after following https://www.redhat.com/en/blog/how-use-gpus-containers-bare-metal-rhel-8, here's the current status/known issues that I've encountered. Hopefully this saves someone some time.
As noted in the comments above you can run containers as root
without issue, but if you try to use --userns keep-id
you're going to have a bad day.
Things that need to be done ahead of time to run rootless containers are documented in https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#step-3-rootless-containers-setup but the cheat sheet version is:
- Install
nvidia-container-toolkit
- Update
/etc/nvidia-container-runtime/config.toml
and setno-cgroups = true
- Use
NVIDIA_VISIBILE_DEVICES
as part of your podman environment. - Specify
--hooks-dir=/usr/share/containers/oci/hooks.d/
(may not strictly be needed).
If you do that, then running: podman run -e NVIDIA_VISIBLE_DEVICES=all --hooks-dir=/usr/share/containers/oci/hooks.d/ --rm -ti myimage nvidia-smi
should result in the usual nvidia-smi
output. But, you'll note that the user in the container is root
and that may not be what you want. If you use --userns keep-id
; e.g. podman run --userns keep-id -e NVIDIA_VISIBLE_DEVICES=all --hooks-dir=/usr/share/containers/oci/hooks.d/ --rm -ti myimage nvidia-smi
you will get an error that states: Error: OCI runtime error: crun: error executing hook /usr/bin/nvidia-container-toolkit (exit code: 1)
. From my reading above the checks that are run require the user to be root
in the container.
Now for the workaround. You don't need this hook, you just need the nvidia-container-cli
tool. All the hook really does is mount the correct libraries, devices, and binaries from the underlying system into the container. We can use nvidia-container-cli -k list
and find
to accomplish the same thing. Here's my one-liner below. Note that I'm excluding both -e NVIDIA_VISIBILE_DEVICES=all
and --hooks-dir=/usr/share/containers/oci/hooks.d/
.
Here's what it looks like:
podman run --userns keep-id $(for file in $(nvidia-container-cli -k list); do find -L $(dirname $file) -xdev -samefile $file; done | awk '{print " -v "$1":"$1}' | xargs) --rm -ti myimage nvidia-smi
This is what the above is doing. We run nvidia-container-cli -k list
which on my system produces output like:
$ nvidia-container-cli -k list
/dev/nvidiactl
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
/dev/nvidia-modeset
/dev/nvidia0
/dev/nvidia1
/usr/bin/nvidia-smi
/usr/bin/nvidia-debugdump
/usr/bin/nvidia-persistenced
/usr/bin/nvidia-cuda-mps-control
/usr/bin/nvidia-cuda-mps-server
/usr/lib64/libnvidia-ml.so.470.141.03
/usr/lib64/libnvidia-cfg.so.470.141.03
/usr/lib64/libcuda.so.470.141.03
/usr/lib64/libnvidia-opencl.so.470.141.03
/usr/lib64/libnvidia-ptxjitcompiler.so.470.141.03
/usr/lib64/libnvidia-allocator.so.470.141.03
/usr/lib64/libnvidia-compiler.so.470.141.03
/usr/lib64/libnvidia-ngx.so.470.141.03
/usr/lib64/libnvidia-encode.so.470.141.03
/usr/lib64/libnvidia-opticalflow.so.470.141.03
/usr/lib64/libnvcuvid.so.470.141.03
/usr/lib64/libnvidia-eglcore.so.470.141.03
/usr/lib64/libnvidia-glcore.so.470.141.03
/usr/lib64/libnvidia-tls.so.470.141.03
/usr/lib64/libnvidia-glsi.so.470.141.03
/usr/lib64/libnvidia-fbc.so.470.141.03
/usr/lib64/libnvidia-ifr.so.470.141.03
/usr/lib64/libnvidia-rtcore.so.470.141.03
/usr/lib64/libnvoptix.so.470.141.03
/usr/lib64/libGLX_nvidia.so.470.141.03
/usr/lib64/libEGL_nvidia.so.470.141.03
/usr/lib64/libGLESv2_nvidia.so.470.141.03
/usr/lib64/libGLESv1_CM_nvidia.so.470.141.03
/usr/lib64/libnvidia-glvkspirv.so.470.141.03
/usr/lib64/libnvidia-cbl.so.470.141.03
/lib/firmware/nvidia/470.141.03/gsp.bin
We then loop through each of those files and run find -L $(dirname $file) -xdev -samefile $file
That finds all the symlinks to a given file. e.g.
find -L /usr/lib64 -xdev -samefile /usr/lib64/libnvidia-ml.so.470.141.03
/usr/lib64/libnvidia-ml.so.1
/usr/lib64/libnvidia-ml.so.470.141.03
/usr/lib64/libnvidia-ml.so
We loop through each of those files and use awk
and xargs
to create the podman cli arguments to bind mount these files into the container; e.g. -v /usr/lib64/libnvidia-ml.so.1:/usr/lib64/libnvidia-ml.so.1 -v /usr/lib64/libnvidia-ml.so.470.141.03:/usr/lib64/libnvidia-ml.so.470.141.03 -v /usr/lib64/libnvidia-ml.so:/usr/lib64/libnvidia-ml.so
etc.
This effectively does what the hook does, using the tools the hook provides, but does not require the user running the container to be root
, and does not require the user inside of the container to be root
.
Hopefully this saves someone else a few hours.
@decandia50 excellent information! your information really deserves to be highlighted. would you consider posting as a blog if we connect you with some people?
Please do not write a blog post with the above information. While the procedure may work on some setups, it is not a supported use of the nvidia-container-cli
tool and will only work correctly und a very narrow set of assumptions.
The better solution is to use podman's integrated CDI support to have podman do the work that libnvidia-container would have otherwise done instead. The future of the nvidia stack (and device support in container runtimes in general) is CDI, and starting to use this method now will future proof how you access generic devices in the future.
Please see below for details on CDI:
https://github.com/container-orchestrated-devices/container-device-interface
We have spent the last year rearchitecting the NVIDIA container stack to work together with CDI, and as part of this have a tool coming out with the next release that will be able to generate CDI specs for nvidia devices for use with podman (and any other CDI compatible runtimes).
In the meantime, you can generate a CDI spec manually, or wait for @elezar to comment on a better method to get a CDI spec generated today.
Here is an example of a (fully functional) CDI spec on my DGX-A100 machine (excluding MIG devices):
cdiVersion: 0.4.0
kind: nvidia.com/gpu
containerEdits:
hooks:
- hookName: createContainer
path: /usr/bin/nvidia-ctk
args:
- /usr/bin/nvidia-ctk
- hook
- update-ldcache
- --folder
- /usr/lib/x86_64-linux-gnu
deviceNodes:
- path: /dev/nvidia-modeset
- path: /dev/nvidiactl
- path: /dev/nvidia-uvm
- path: /dev/nvidia-uvm-tools
mounts:
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libcuda.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libcuda.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvcuvid.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvcuvid.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvoptix.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvoptix.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libGL.so.1.0.0
hostPath: /usr/lib/x86_64-linux-gnu/libGL.so.1.0.0
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libEGL.so.1.0.0
hostPath: /usr/lib/x86_64-linux-gnu/libEGL.so.1.0.0
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libGLESv1_CM.so.1.0.0
hostPath: /usr/lib/x86_64-linux-gnu/libGLESv1_CM.so.1.0.0
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libGLESv2.so.2.0.0
hostPath: /usr/lib/x86_64-linux-gnu/libGLESv2.so.2.0.0
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.460.91.03
hostPath: /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.460.91.03
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-smi
hostPath: /usr/bin/nvidia-smi
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-debugdump
hostPath: /usr/bin/nvidia-debugdump
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-persistenced
hostPath: /usr/bin/nvidia-persistenced
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-cuda-mps-control
hostPath: /usr/bin/nvidia-cuda-mps-control
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /usr/bin/nvidia-cuda-mps-server
hostPath: /usr/bin/nvidia-cuda-mps-server
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /var/run/nvidia-persistenced/socket
hostPath: /var/run/nvidia-persistenced/socket
options:
- ro
- nosuid
- nodev
- bind
- containerPath: /var/run/nvidia-fabricmanager/socket
hostPath: /var/run/nvidia-fabricmanager/socket
options:
- ro
- nosuid
- nodev
- bind
devices:
- containerEdits:
deviceNodes:
- path: /dev/nvidia0
name: gpu0
- containerEdits:
deviceNodes:
- path: /dev/nvidia1
name: gpu1
- containerEdits:
deviceNodes:
- path: /dev/nvidia2
name: gpu2
- containerEdits:
deviceNodes:
- path: /dev/nvidia3
name: gpu3
- containerEdits:
deviceNodes:
- path: /dev/nvidia4
name: gpu4
- containerEdits:
deviceNodes:
- path: /dev/nvidia5
name: gpu5
- containerEdits:
deviceNodes:
- path: /dev/nvidia6
name: gpu6
- containerEdits:
deviceNodes:
- path: /dev/nvidia7
name: gpu7
Maintaining a nvidia.json CDI spec file for multiple machines with different Nvidia drivers and other libs is a bit painful.
For instance, NVIDIA driver installer should create libnvidia-compiler.so symlink to libnvidia-compiler.so.460.91.03, etc...
The CDI nvidia.json will just take the symlinks to avoid the manual setting of all mappings for a particular driver version...
I am already using CDI specs in our machines but I would like to test a tool to generate de CDI spec for any system...
@Ru13en we have a WIP Merge Request that adds an:
nvidia-ctk info generate-cdi
command to the NVIDIA Container Toolkit. This idea being that this could be run at boot or triggered on a driver installation / upgrade. We are working at getting an v1.12.0-rc.1
out that includes this functionality for early testing and feedback.
It will be released next week.
I tried using the WIP version of nvidia-ctk
(from the master branch of https://gitlab.com/nvidia/container-toolkit/container-toolkit) and was able to get it working with rootless podman, but not without issues. I have documented them in https://gitlab.com/nvidia/container-toolkit/container-toolkit/-/issues/8.
@rhatdan The upcoming version of nvidia cdi generator will be using cdi version 0.5.0 while the latest podman version 4.2.0 still uses 0.4.0. Any idea when 4.3.0 might be available? (I see that 4.3.0-rc1 uses 0.5.0)
Thanks for the confirmation @starry91. The official release of v1.12.0-rc.1 has been delayed a little but thanks for testing the toolking nonetheless. I will have a look a the issue you created and update the tooling before releasing the rc.
We have recently updated our Podman support and now recommend using CDI -- which is supported natively in more recent Podman versions.
See https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-podman for details.