fail to get response from manager, error rpc error: code = Unknown desc = can't find kubepods-besteffort-pod6f1c7606_fb47_4c34_82aa_9b9966435a65.slice from docker
Opened this issue · 2 comments
hyc-yuchen commented
when i use nvidia-smi in pod it comes err that :
fail to get response from manager, error rpc error: code = Unknown desc = can't find kubepods-besteffort-pod6f1c7606_fb47_4c34_82aa_9b9966435a65.slice from docker
pandaoknight commented
Same problem, already set '--container-runtime-endpoint=/var/run/containerd/containerd.sock'
image: tkestack/gpu-manager:v1.1.5
runtime: containerd
K8s: v1.24.17
Maybe it is ctr's namespace problem, but I don't know how to debug.
xxsoul commented
Is the cgroup version used on the host machine v1 or v2? gpu-manager code uses the path of cgroup v1 to try to read the PID of the container process relative to the host machine, if the host machine is running cgroup v2 it will cause gpu-manager to not be able to read it.