intel/intel-device-plugins-for-kubernetes

gpu_plugin failing on k3s with kubelet.crt error

LarryGF opened this issue · 7 comments

I am running k3s:

❯ k3s -v
k3s version v1.26.3+k3s1 (01ea3ff2)
go version go1.19.7

and I have installed the gpu-plugin using the helm charts (I am using an umbrella helm chart to install both the operator and the plugin, here is what I have:

apiVersion: v2
name: intel-gpu-umbrella
version: 0.0.1
dependencies:
  - name: intel-device-plugins-operator
    repository: https://intel.github.io/helm-charts
    version: 0.28.0
  - name: intel-device-plugins-gpu
    repository: https://intel.github.io/helm-charts
    version: 0.28.0

and these are my values:

intel-device-plugins-operator:
  nodeSelector:
    kubernetes.io/arch: amd64

  manager:
    image:
      hub: intel
      tag: ""
      pullPolicy: IfNotPresent

  kubeRbacProxy:
    image:
      hub: gcr.io
      hubRepo: kubebuilder
      tag: v0.14.1
      pullPolicy: IfNotPresent

  privateRegistry:
    registryUrl: ""
    registryUser: ""
    registrySecret: ""

intel-device-plugins-gpu:
  name: gpudeviceplugin

  image:
    hub: intel
    tag: ""

  initImage:
    enable: true
    hub: intel
    tag: ""

  sharedDevNum: 2
  logLevel: 2
  resourceManager: true
  enableMonitoring: true
  allocationPolicy: "none"

  nodeSelector:
    intel.feature.node.kubernetes.io/gpu: 'true'

  nodeFeatureRule: true

When I try to deploy it with that config, the intel-gpu-plugin-* pod gets stuck in "Init" because it's trying to mount kubelet.crt, which does not exist in any of my nodes (not even the /var/lib/kubelet/pki/ folder is present):

>  kubectl describe pods -n kube-system intel-gpu-plugin-6hdmx
(...)
 Mounts:
      /dev/dri from devfs (ro)
      /etc/kubernetes/node-feature-discovery/features.d/ from nfd-features (rw)
      /sys/class/drm from sysfsdrm (ro)
      /sys/devices from sysfsdevices (ro)
      /var/lib/kubelet/device-plugins from kubeletsockets (rw)
      /var/lib/kubelet/pki/kubelet.crt from kubeletcrt (ro)
      /var/lib/kubelet/pod-resources from podresources (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2fkvm (ro)
(...)
Volumes:
  kubeletcrt:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pki/kubelet.crt
    HostPathType:  FileOrCreate
(...)
Events:
  Type     Reason       Age                 From               Message
  ----     ------       ----                ----               -------
  Normal   Scheduled    108s                default-scheduler  Successfully assigned kube-system/intel-gpu-plugin-6hdmx to kilvin
  Warning  FailedMount  45s (x8 over 108s)  kubelet            MountVolume.SetUp failed for volume "kubeletcrt" : open /var/lib/kubelet/pki/kubelet.crt: no such file or directory

When I change sharedDevNum to 1 and disable resourceManager it doesn't try to mount kubelet.crt and is able to get scheduled, but fails anyways with:

❯ kubectl logs -n kube-system intel-gpu-plugin-b4q9b
Defaulted container "intel-gpu-plugin" out of: intel-gpu-plugin, intel-gpu-initcontainer (init)
I1205 11:04:31.178253       1 gpu_plugin.go:532] GPU device plugin started with none preferred allocation policy
I1205 11:04:31.179027       1 gpu_plugin.go:348] GPU 'i915' resource share count = 1
I1205 11:04:31.179897       1 gpu_plugin.go:363] GPU scan update: 0->1 'i915_monitoring' resources found
I1205 11:04:31.179926       1 gpu_plugin.go:363] GPU scan update: 0->1 'i915' resources found
I1205 11:04:32.180643       1 server.go:267] Start server for i915_monitoring at: /var/lib/kubelet/device-plugins/gpu.intel.com-i915_monitoring.sock
I1205 11:04:32.181445       1 server.go:267] Start server for i915 at: /var/lib/kubelet/device-plugins/gpu.intel.com-i915.sock
I1205 11:04:32.185269       1 server.go:285] Device plugin for i915_monitoring registered
E1205 11:04:32.185391       1 manager.go:146] Failed to serve gpu.intel.com/i915_monitoring: too many open files
Failed to create watcher for /var/lib/kubelet/device-plugins/gpu.intel.com-i915_monitoring.sock
github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin.watchFile
        /go/src/github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin/server.go:307
github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin.(*server).setupAndServe
        /go/src/github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin/server.go:289
github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin.(*server).Serve
        /go/src/github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin/server.go:207
github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin.(*Manager).handleUpdate.func1
        /go/src/github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin/manager.go:144
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1598

Any idea what might be happening there?

Unless you plan to use GPU Aware Scheduling (GAS), you shouldn't enable resourceManager in the GPU plugin CR. In most cases, GAS is not needed. Though, if the kubelet.crt is missing I don't think it should prevent the plugin from running. The file should get created if it doesn't exist.

For the panic, I believe your host's settings are too tight. You should be able to increase the limit with ulimit -n <value>. Check the initial value and then double it. That should fix the issue.
EDIT: You can also use sysctl -w fs.file-max=<value> to achieve the same.

Also remembered that for 0.28.0 there's no need for the initImage anymore. You can set intel-device-plugins-gpu.initImage.enable to false. The default should be false, I think.

Thanks for the input @tkatila, I thought that it was supposed to create the kubelet.crt but I don't see it being created. I wanted to use resourceManager in case I wanted to run an additional pod in that node and give it access to the GPU.
You were right about the host's settings, I had a limit of 1024, I increased it to 9000 in intervals (2048.4096,9000) and the pod was able to start, I also checked sysctl and I have a fs.file-max = 9223372036854775807
. I thought 9000 might be a little too high and reduced it, and now the pod is unable to start again, it keeps crashing, I will have to check further into this, but once again, thanks for your help.

I thought that it was supposed to create the kubelet.crt but I don't see it being created.

Kubelet.crt should be on the host if the host has kubelet running. That's common for vanilla k8s installations. It might be that k3s doesn't have kubelet or its functionality is part of some other entity.

I wanted to use resourceManager in case I wanted to run an additional pod in that node and give it access to the GPU.

Increase sharedDevNum to whatever number you desire to share a GPU. Your values set it to 2 so two containers can access the same GPU. Resource Manager is not needed for basic sharing.

Kubelet.crt should be on the host if the host has kubelet running. That's common for vanilla k8s installations. It might be that k3s doesn't have kubelet or its functionality is part of some other entity.

If I remember correctly, k3s runs its own kubelet and handles it internally

Increase sharedDevNum to whatever number you desire to share a GPU. Your values set it to 2 so two containers can access the same GPU. Resource Manager is not needed for basic sharing.

Good to know, I hadn't tried that, I received an error when trying to run resourceManager with sharedDevNum: 1, so I assumed it wouldn't work the other way around

I had a chance to test it all and after I removed some stuff that was running on the node (I wasn't able to solve the too many files open error without getting pods out of the node) and it's working now. This is the config I used:

intel-device-plugins-gpu:

  initImage:
    enable: false
    hub: intel
    tag: ""

  sharedDevNum: 2
  logLevel: 2
  resourceManager: false
  enableMonitoring: false
  allocationPolicy: "none"

  nodeSelector:
    intel.feature.node.kubernetes.io/gpu: 'true'

  nodeFeatureRule: true

I am going to close the issue now, thanks again for your help @tkatila

eero-t commented

I had a chance to test it all and after I removed some stuff that was running on the node (I wasn't able to solve the too many files open error without getting pods out of the node)

This gives rough idea of what procesess are using most FDs:

awk '
/^Name/ {name=$2}
/^Pid/ {pid=$2}
/^FDSize/ {printf("%5d [%d] %s\n", $2, pid, name); nextfile}
' /proc/*/status | sort -nr | head -20

And more details you get with this (much slower):

for i in /proc/*/fd/; do
    count=$(ls $i | wc -l);
    pid=$(echo $i|cut -d/ -f3);
    cmd=$(tr '\0' ' ' < ${i%/fd/}/cmdline);
    echo "$count [$pid] $cmd";
done | sort -nr | head -20

(You can just copy-paste above things to shell. You need root to see info for all commands.)