NVIDIA/nvidia-container-runtime

Permissions of nvidia-container-runtime with podman not working

Ru13en opened this issue · 5 comments

1. Issue or feature description

For each system boot/reboot rootless podman does not work with the nvidia plugin.
I must run nvidia-smi, otherwise i get the error:
Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime error

After that i also need to run the NVIDIA Device Node Verification script to proper startup the /dev/nvidia-uvm for CUDA applications as described in this post:
tensorflow/tensorflow#32623 (comment)

2. Steps to reproduce the issue

Install CentOS 8 with selinux enabled + Nvidia linux drivers.
Install podman and nvidia-container-runtime
Configure /etc/nvidia-container-runtime/config.toml (as attachment)
Reboot the HW

Run the command (it will fail if you dont use nvidia-smi & nvidia-device-node-verification, after each reboot):

podman run --privileged -it nvidia/cuda:11.3.1-base-centos8 nvidia-smi
podman run --privileged -it --rm tensorflow/tensorflow:latest-gpu python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

Run the commands (it will work):

nvidia-smi
podman run --privileged -it nvidia/cuda:11.3.1-base-centos8 nvidia-smi
sh nvidia-device-node-verification.sh #(from https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-verifications)
podman run --privileged -it --rm tensorflow/tensorflow:latest-gpu python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

3. Information to attach (optional if deemed irrelevant)

NAME="CentOS Linux"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
CentOS Linux release 8.3.2011
CentOS Linux release 8.3.2011

getenforce:
Enforcing

podman info:

  arch: amd64
  buildahVersion: 1.20.1
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.27-1.el8.1.5.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.27, commit: '
  cpus: 80
  distribution:
    distribution: '"centos"'
    version: "8"
  eventLogger: journald
  hostname: turing
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 2002
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 2002
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 4.18.0-240.22.1.el8_3.x86_64
  linkmode: dynamic
  memFree: 781801324544
  memTotal: 809933586432
  ociRuntime:
    name: crun
    package: crun-0.19.1-2.el8.3.1.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 0.19.1
      commit: 1535fedf0b83fb898d449f9680000f729ba719f5
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/2002/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    selinuxEnabled: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.8-4.el8.7.6.x86_64
    version: |-
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.3.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.4.3
  swapFree: 42949668864
  swapTotal: 42949668864
  uptime: 29h 16m 48.14s (Approximately 1.21 days)
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 29
    paused: 0
    running: 0
    stopped: 29
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.5.0-1.el8.5.3.x86_64
      Version: |-
        fusermount3 version: 3.2.1
        fuse-overlayfs: version 1.5
        FUSE library version 3.2.1
        using FUSE kernel interface version 7.26
  graphRoot: /home/user/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 28
  runRoot: /run/user/2002/containers
  volumePath: /home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 3.1.2
  Built: 1619185402
  BuiltTime: Fri Apr 23 14:43:22 2021
  GitCommit: ""
  GoVersion: go1.14.12
  OsArch: linux/amd64
  Version: 3.1.2
nvidia-smi | grep Version
NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3

cat /etc/nvidia-container-runtime/config.toml

disable-require = false
#swarm-resource = "DOCKER_RESOURCE_GPU"
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
#accept-nvidia-visible-devices-as-volume-mounts = false

[nvidia-container-cli]
#root = "/run/nvidia/driver"
#path = "/usr/bin/nvidia-container-cli"
environment = []
#debug = "/var/log/nvidia-container-toolkit.log"
#ldcache = "/etc/ld.so.cache"
load-kmods = true
no-cgroups = true
#user = "root:video"
ldconfig = "@/sbin/ldconfig"

Thanks for creating the new issue @Ru13en

Here I would assume that kernel modules cannot be loaded by the NVIDIA container runtime hook. This also prevents the device nodes from being created. nvidia-smi ends up loading the Kernel modules and creating the device nodes, but does seem to skip the creation of nvidia-uvm and nvidia-uvm-tools -- which is handled by the "Device Node verification" script that you mentioned.

Is it possible to run the script on startup of the system?

@elezar Yes, i fixed it creating a script that runs both commands at startup. However, is not a user friendly approach...

I don't know whether there is a way around this for rootless podman (I would have to check), but I would expect this to work in the rootful case since the NVIDIA container toolkit DOES load the kernel modules and create the devices nodes on the host as part of creating the container. Could you uncomment the debug option in the toolkit config (#debug = "/var/log/nvidia-container-toolkit.log") and attach the contents of the file when launching a rootful container that fails?

Testing with:

podman run --privileged -it --rm tensorflow/tensorflow:latest-gpu python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

@elezar for some reason now i cannot replicate the issue for rootful runs, but the behavior continues on rootless (maybe with some updates it was fixed, since i made the previous post in May).
For rootless, unless the root user starts a container it will trigger:

Error: error executing hook `/usr/bin/nvidia-container-toolkit` (exit code: 1): OCI runtime error

If i run the command with sudo and after without it, it runs normally (the NVIDIA container toolkit is loading the kernel modules and devices nodes)

elezar commented

Please see the updated instructions for running the NVIDIA Container Runtime with Podman.

If you're still having problems, please open a new issue against https://github.com/NVIDIA/nvidia-container-toolkit.