nvidia-smi mapped into a container as a blank file when using k8s+containerd+nvidia-container-runtime

Question

nvidia-smi mapped into a container as a blank file when using k8s+containerd+nvidia-container-runtime

drtpotter opened this issue 3 years ago · 3 comments

Hi there,

I'm trying to use nvidia-container-runtime with containerd 1.4.4 under Kubernetes 1.21 but /usr/bin/nvidia-smi seems to be mapped into the container as a blank file. I'm using OpenSUSE Tumbleweed.

If I use CRI-O with the nvidia container runtime under k8s my target container can see my RTX 3060. I can run nvidia-smi, so I'm pretty sure my container is set up correctly. Unfortunately other applications don't seem to work with CRI-O so I am trying to use k8s+Containerd instead.

If I switch to Containerd I can run this command fine.

sudo containerd-ctr -a /var/run/docker/containerd/containerd.sock run --rm --gpus 0 docker.io/nvidia/cuda:11.0-base nvidia-smi nvidia-smi

However when using k8s+containerd it looks like /usr/bin/nvidia-smi is mapped through as a blank file inside the container. I must stress that the runtime works fine under k8s+CRI-O so it appears that under k8s+containerd something has gone wrong with making nvidia-smi available inside the container. Here is my /etc/containerd/config.toml and I'm pretty sure I have followed the NVIDIA directions for patching the file to use the nvidia container runtime.

version = 2
root = "/var/lib/docker/containerd/daemon"
state = "/var/run/docker/containerd/daemon"
plugin_dir = ""
disabled_plugins = []
required_plugins = []
oom_score = 0

[grpc]
address = "/var/run/docker/containerd/containerd.sock"
tcp_address = ""
tcp_tls_cert = ""
tcp_tls_key = ""
uid = 0
gid = 0
max_recv_message_size = 16777216
max_send_message_size = 16777216

[ttrpc]
address = ""
uid = 0
gid = 0

[debug]
address = ""
uid = 0
gid = 0
level = ""

[metrics]
address = ""
grpc_histogram = false

[cgroup]
path = ""

[timeouts]
"io.containerd.timeout.shim.cleanup" = "5s"
"io.containerd.timeout.shim.load" = "5s"
"io.containerd.timeout.shim.shutdown" = "3s"
"io.containerd.timeout.task.state" = "2s"

[plugins]
[plugins."io.containerd.gc.v1.scheduler"]
pause_threshold = 0.02
deletion_threshold = 0
mutation_threshold = 100
schedule_delay = "0s"
startup_delay = "100ms"
[plugins."io.containerd.grpc.v1.cri"]
disable_tcp_service = true
stream_server_address = "127.0.0.1"
stream_server_port = "0"
stream_idle_timeout = "4h0m0s"
enable_selinux = false
selinux_category_range = 1024
sandbox_image = "k8s.gcr.io/pause:3.2"
stats_collect_period = 10
systemd_cgroup = false
enable_tls_streaming = false
max_container_log_line_size = 16384
disable_cgroup = false
disable_apparmor = false
restrict_oom_score_adj = false
max_concurrent_downloads = 3
disable_proc_mount = false
unset_seccomp_profile = ""
tolerate_missing_hugetlb_controller = true
disable_hugetlb_controller = true
ignore_image_defined_volumes = false
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "overlayfs"
default_runtime_name = "runc"
no_pivot = false
disable_snapshot_annotations = true
discard_unpacked_layers = false
[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
runtime_type = ""
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
base_runtime_spec = ""
[plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
runtime_type = ""
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
base_runtime_spec = ""
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
runtime_engine = ""
runtime_root = ""
privileged_without_host_devices = false
base_runtime_spec = ""
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v1"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"
max_conf_num = 1
conf_template = ""
[plugins."io.containerd.grpc.v1.cri".registry]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://registry-1.docker.io"]
[plugins."io.containerd.grpc.v1.cri".image_decryption]
key_model = ""
[plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
tls_cert_file = ""
tls_key_file = ""
[plugins."io.containerd.internal.v1.opt"]
path = "/opt/containerd"
[plugins."io.containerd.internal.v1.restart"]
interval = "10s"
[plugins."io.containerd.metadata.v1.bolt"]
content_sharing_policy = "shared"
[plugins."io.containerd.monitor.v1.cgroups"]
no_prometheus = false
[plugins."io.containerd.runtime.v1.linux"]
shim = "containerd-shim"
runtime = "runc"
runtime_root = ""
no_shim = false
shim_debug = false
[plugins."io.containerd.runtime.v2.task"]
platforms = ["linux/amd64"]
[plugins."io.containerd.service.v1.diff-service"]
default = ["walking"]
[plugins."io.containerd.snapshotter.v1.devmapper"]
root_path = ""
pool_name = ""
base_image_size = ""
async_remove = false

Any help or suggestions as to why /usr/bin/nvidia-smi is mapped through as a blank file in the container would be most appreciated!

Kind regards,
Toby

Answer 1 · 2021-07-23T12:16:03.000Z

Hi @drtpotter when launching a container on k8s+containerd do you specify a runtime class to ensure that the nvidia-runtime is selected? Note that the --gpus all flag on the containerd-ctr command line works differently to how k8s would run a container using containerd.

Some suggestions:

Try to get the container started using ctr and specifying the nvidia runtime explicitly instead of relying on the --gpus all flag.
Check whether it works as expected when nvidia is set as the default_runtime_name in the containerd config.
Ensure that the podspec for the GPU-enabled pods include a RuntimeClass of nvidia (matching the runtime name in containerd).

Answer 2 · 2021-07-26T02:43:07.000Z

Hi @elezer, yes changing this line in /etc/containerd/config.toml

default_runtime_name = "runc"

to

default_runtime_name = "nvidia"

fixed the problem. I'd recommend having this change integrated into the containerd section of the nvidia-container-runtime documentation at

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html

Thanks for the suggestions, happy to close this issue!

Answer 3 · 2021-08-09T09:21:35.000Z

Thanks @drtpotter. I have added a task to update the docs. Glad that we were able to resolve the issue for you.