tkestack/gpu-manager

fail to get response from manager, error rpc error: code = Unknown desc = can't find kubepods-besteffort-pod6f1c7606_fb47_4c34_82aa_9b9966435a65.slice from docker

Opened this issue · 2 comments

when i use nvidia-smi in pod it comes err that :
fail to get response from manager, error rpc error: code = Unknown desc = can't find kubepods-besteffort-pod6f1c7606_fb47_4c34_82aa_9b9966435a65.slice from docker

Same problem, already set '--container-runtime-endpoint=/var/run/containerd/containerd.sock'

image: tkestack/gpu-manager:v1.1.5
runtime: containerd
K8s: v1.24.17

Maybe it is ctr's namespace problem, but I don't know how to debug.

Is the cgroup version used on the host machine v1 or v2? gpu-manager code uses the path of cgroup v1 to try to read the PID of the container process relative to the host machine, if the host machine is running cgroup v2 it will cause gpu-manager to not be able to read it.