Permission denied with sudo command
Choiuijin1125 opened this issue · 5 comments
1. Issue or feature description
When I run commands(nvidia-container-cli info, list
) with sudo command I got below error.
so I can't use nvidia-docker like docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
2. Steps to reproduce the issue
not working
sudo nvidia-container-cli list
nvidia-container-cli: initialization error: open failed: /proc/sys/kernel/overflowuid: permission denied
wokring
nvidia-container-cli list
/dev/nvidiactl
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
/dev/nvidia-modeset
/dev/nvidia0
/dev/nvidia1
/dev/nvidia2
/dev/nvidia3
/usr/bin/nvidia-smi
/usr/bin/nvidia-debugdump
/usr/bin/nvidia-persistenced
/usr/bin/nvidia-cuda-mps-control
/usr/bin/nvidia-cuda-mps-server
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.455.23.05
/usr/lib/x86_64-linux-gnu/libcuda.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.455.23.05
/usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvcuvid.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvoptix.so.455.23.05
/usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.455.23.05
/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.455.23.05
/usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.455.23.05
/usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.455.23.05
/usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.455.23.05
Could you give more information on your setup including distribution and NVIDIA Container Toolkit package versions?
This sounds like an issue with user-namespaces independent of the nvidia-container-runtime
. What OS are you running on and what peculiarities might you have configured on your system beyond a stock distribution?
this is server information
3. Information to attach (optional if deemed irrelevant)
- Some nvidia-container information:
nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I0323 07:31:52.052389 30761 nvc.c:376] initializing library context (version=1.9.0, build=5e135c17d6dbae861ec343e9a8d3a0d2af758a4f)
I0323 07:31:52.052632 30761 nvc.c:350] using root /
I0323 07:31:52.052649 30761 nvc.c:351] using ldcache /etc/ld.so.cache
I0323 07:31:52.052658 30761 nvc.c:352] using unprivileged user 1000:1002
I0323 07:31:52.052714 30761 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0323 07:31:52.052897 30761 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
W0323 07:31:52.059423 30762 nvc.c:273] failed to set inheritable capabilities
W0323 07:31:52.059488 30762 nvc.c:274] skipping kernel modules load due to failure
I0323 07:31:52.060095 30763 rpc.c:71] starting driver rpc service
I0323 07:31:52.065595 30764 rpc.c:71] starting nvcgo rpc service
I0323 07:31:52.070177 30761 nvc_info.c:765] requesting driver information with ''
I0323 07:31:52.072263 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.455.23.05
I0323 07:31:52.072526 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.455.23.05
I0323 07:31:52.072683 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.455.23.05
I0323 07:31:52.073086 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.455.23.05
I0323 07:31:52.073480 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.455.23.05
I0323 07:31:52.073675 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.455.23.05
I0323 07:31:52.073789 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.455.23.05
I0323 07:31:52.073913 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.455.23.05
I0323 07:31:52.073973 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.455.23.05
I0323 07:31:52.074100 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.455.23.05
I0323 07:31:52.074318 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.455.23.05
I0323 07:31:52.074403 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.455.23.05
I0323 07:31:52.074492 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.455.23.05
I0323 07:31:52.074630 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.455.23.05
I0323 07:31:52.075154 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.455.23.05
I0323 07:31:52.075542 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.455.23.05
I0323 07:31:52.075770 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.455.23.05
I0323 07:31:52.075858 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.455.23.05
I0323 07:31:52.076018 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.455.23.05
I0323 07:31:52.076225 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.455.23.05
I0323 07:31:52.076309 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.455.23.05
I0323 07:31:52.076614 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.455.23.05
I0323 07:31:52.076949 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.455.23.05
I0323 07:31:52.077066 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.455.23.05
I0323 07:31:52.077133 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.455.23.05
I0323 07:31:52.077370 30761 nvc_info.c:172] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.455.23.05
W0323 07:31:52.077418 30761 nvc_info.c:398] missing library libnvidia-nscq.so
W0323 07:31:52.077427 30761 nvc_info.c:398] missing library libnvidia-fatbinaryloader.so
W0323 07:31:52.077439 30761 nvc_info.c:398] missing library libnvidia-pkcs11.so
W0323 07:31:52.077444 30761 nvc_info.c:402] missing compat32 library libnvidia-ml.so
W0323 07:31:52.077456 30761 nvc_info.c:402] missing compat32 library libnvidia-cfg.so
W0323 07:31:52.077468 30761 nvc_info.c:402] missing compat32 library libnvidia-nscq.so
W0323 07:31:52.077477 30761 nvc_info.c:402] missing compat32 library libcuda.so
W0323 07:31:52.077488 30761 nvc_info.c:402] missing compat32 library libnvidia-opencl.so
W0323 07:31:52.077499 30761 nvc_info.c:402] missing compat32 library libnvidia-ptxjitcompiler.so
W0323 07:31:52.077510 30761 nvc_info.c:402] missing compat32 library libnvidia-fatbinaryloader.so
W0323 07:31:52.077524 30761 nvc_info.c:402] missing compat32 library libnvidia-allocator.so
W0323 07:31:52.077532 30761 nvc_info.c:402] missing compat32 library libnvidia-compiler.so
W0323 07:31:52.077545 30761 nvc_info.c:402] missing compat32 library libnvidia-pkcs11.so
W0323 07:31:52.077554 30761 nvc_info.c:402] missing compat32 library libnvidia-ngx.so
W0323 07:31:52.077563 30761 nvc_info.c:402] missing compat32 library libvdpau_nvidia.so
W0323 07:31:52.077572 30761 nvc_info.c:402] missing compat32 library libnvidia-encode.so
W0323 07:31:52.077578 30761 nvc_info.c:402] missing compat32 library libnvidia-opticalflow.so
W0323 07:31:52.077587 30761 nvc_info.c:402] missing compat32 library libnvcuvid.so
W0323 07:31:52.077594 30761 nvc_info.c:402] missing compat32 library libnvidia-eglcore.so
W0323 07:31:52.077602 30761 nvc_info.c:402] missing compat32 library libnvidia-glcore.so
W0323 07:31:52.077609 30761 nvc_info.c:402] missing compat32 library libnvidia-tls.so
W0323 07:31:52.077617 30761 nvc_info.c:402] missing compat32 library libnvidia-glsi.so
W0323 07:31:52.077627 30761 nvc_info.c:402] missing compat32 library libnvidia-fbc.so
W0323 07:31:52.077639 30761 nvc_info.c:402] missing compat32 library libnvidia-ifr.so
W0323 07:31:52.077648 30761 nvc_info.c:402] missing compat32 library libnvidia-rtcore.so
W0323 07:31:52.077656 30761 nvc_info.c:402] missing compat32 library libnvoptix.so
W0323 07:31:52.077668 30761 nvc_info.c:402] missing compat32 library libGLX_nvidia.so
W0323 07:31:52.077678 30761 nvc_info.c:402] missing compat32 library libEGL_nvidia.so
W0323 07:31:52.077687 30761 nvc_info.c:402] missing compat32 library libGLESv2_nvidia.so
W0323 07:31:52.077698 30761 nvc_info.c:402] missing compat32 library libGLESv1_CM_nvidia.so
W0323 07:31:52.077704 30761 nvc_info.c:402] missing compat32 library libnvidia-glvkspirv.so
W0323 07:31:52.077715 30761 nvc_info.c:402] missing compat32 library libnvidia-cbl.so
I0323 07:31:52.078680 30761 nvc_info.c:298] selecting /usr/bin/nvidia-smi
I0323 07:31:52.078726 30761 nvc_info.c:298] selecting /usr/bin/nvidia-debugdump
I0323 07:31:52.078758 30761 nvc_info.c:298] selecting /usr/bin/nvidia-persistenced
I0323 07:31:52.078867 30761 nvc_info.c:298] selecting /usr/bin/nvidia-cuda-mps-control
I0323 07:31:52.078902 30761 nvc_info.c:298] selecting /usr/bin/nvidia-cuda-mps-server
W0323 07:31:52.078967 30761 nvc_info.c:424] missing binary nv-fabricmanager
W0323 07:31:52.079162 30761 nvc_info.c:348] missing firmware path /lib/firmware/nvidia/455.23.05/gsp.bin
I0323 07:31:52.079194 30761 nvc_info.c:528] listing device /dev/nvidiactl
I0323 07:31:52.079201 30761 nvc_info.c:528] listing device /dev/nvidia-uvm
I0323 07:31:52.079212 30761 nvc_info.c:528] listing device /dev/nvidia-uvm-tools
I0323 07:31:52.079222 30761 nvc_info.c:528] listing device /dev/nvidia-modeset
W0323 07:31:52.079260 30761 nvc_info.c:348] missing ipc path /var/run/nvidia-persistenced/socket
W0323 07:31:52.079288 30761 nvc_info.c:348] missing ipc path /var/run/nvidia-fabricmanager/socket
W0323 07:31:52.079308 30761 nvc_info.c:348] missing ipc path /tmp/nvidia-mps
I0323 07:31:52.079315 30761 nvc_info.c:821] requesting device information with ''
I0323 07:31:52.089701 30761 nvc_info.c:712] listing device /dev/nvidia0 (GPU-f26f1091-107b-7f7e-ccc2-c3a6c7da082c at 00000000:00:06.0)
I0323 07:31:52.105924 30761 nvc_info.c:712] listing device /dev/nvidia1 (GPU-73bd9d4d-1037-f06a-c1d5-00e580457554 at 00000000:00:07.0)
I0323 07:31:52.117356 30761 nvc_info.c:712] listing device /dev/nvidia2 (GPU-12127740-3804-2777-66a2-d51623c3d17b at 00000000:00:08.0)
I0323 07:31:52.129291 30761 nvc_info.c:712] listing device /dev/nvidia3 (GPU-d2cc3c0c-c504-a135-ec1a-e712af5b8380 at 00000000:00:09.0)
NVRM version: 455.23.05
CUDA version: 11.1
Device Index: 0
Device Minor: 0
Model: Tesla T4
Brand: Tesla
GPU UUID: GPU-f26f1091-107b-7f7e-ccc2-c3a6c7da082c
Bus Location: 00000000:00:06.0
Architecture: 7.5
Device Index: 1
Device Minor: 1
Model: Tesla T4
Brand: Tesla
GPU UUID: GPU-73bd9d4d-1037-f06a-c1d5-00e580457554
Bus Location: 00000000:00:07.0
Architecture: 7.5
Device Index: 2
Device Minor: 2
Model: Tesla T4
Brand: Tesla
GPU UUID: GPU-12127740-3804-2777-66a2-d51623c3d17b
Bus Location: 00000000:00:08.0
Architecture: 7.5
Device Index: 3
Device Minor: 3
Model: Tesla T4
Brand: Tesla
GPU UUID: GPU-d2cc3c0c-c504-a135-ec1a-e712af5b8380
Bus Location: 00000000:00:09.0
Architecture: 7.5
I0323 07:31:52.129507 30761 nvc.c:430] shutting down library context
I0323 07:31:52.129662 30764 rpc.c:95] terminating nvcgo rpc service
I0323 07:31:52.130498 30761 rpc.c:135] nvcgo rpc service terminated successfully
I0323 07:31:52.134364 30763 rpc.c:95] terminating driver rpc service
I0323 07:31:52.134608 30761 rpc.c:135] driver rpc service terminated successfully
- Kernel version from
uname -a
Linux gpu-1 4.15.0-159-generic #167-Ubuntu SMP Tue Sep 21 08:55:05 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
- Docker version from
docker version
Client: Docker Engine - Community
Version: 20.10.12
API version: 1.41
Go version: go1.16.12
Git commit: e91ed57
Built: Mon Dec 13 11:45:27 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.12
API version: 1.41 (minimum version 1.12)
Go version: go1.16.12
Git commit: 459d0df
Built: Mon Dec 13 11:43:36 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.12
GitCommit: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
runc:
Version: 1.0.2
GitCommit: v1.0.2-0-g52b36a2
docker-init:
Version: 0.19.0
GitCommit: de40ad0
- NVIDIA packages version from
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
un libgldispatch0-nvidia <none> <none> (no description available)
rc libnvidia-compute-450:amd64 450.142.00-0ubuntu1 amd64 NVIDIA libcompute package
ii libnvidia-container-tools 1.9.0-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.9.0-1 amd64 NVIDIA container runtime library
un libnvidia-ml1 <none> <none> (no description available)
un nvidia-304 <none> <none> (no description available)
un nvidia-340 <none> <none> (no description available)
un nvidia-384 <none> <none> (no description available)
un nvidia-common <none> <none> (no description available)
un nvidia-container-runtime <none> <none> (no description available)
un nvidia-container-runtime-hook <none> <none> (no description available)
ii nvidia-container-toolkit 1.9.0-1 amd64 NVIDIA container runtime hook
un nvidia-docker <none> <none> (no description available)
ii nvidia-docker2 2.10.0-1 all nvidia-docker CLI wrapper
un nvidia-opencl-icd <none> <none> (no description available)
un nvidia-prime <none> <none> (no description available)
- NVIDIA container library version from
nvidia-container-cli -V
cli-version: 1.9.0
lib-version: 1.9.0
build date: 2022-03-18T13:46+00:00
build revision: 5e135c17d6dbae861ec343e9a8d3a0d2af758a4f
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
@klueska I also suspect user-namespaces issues, I'm using ubuntu 18. 04 and only changed docker group permission like below
sudo usermod -aG docker $USER
One thing that I'm suspect is that I didn't reboot server after install nvidia-docker2
I'll try to reboot and test sometime soon
Please see NVIDIA/nvidia-container-toolkit#102 which presents similar behaviour and a possible resolution.