kubernetes-csi/csi-driver-nfs

Single-node (WIP) cluster can't schedule controller

IngwiePhoenix opened this issue · 6 comments

(Yep, I did read the template; but for some odd reason I am not seing the signup verification email. I am pretty sure it's a layer 8 problem... so, apologies in advance!)

Hello! I am trying to bootstrap the NFS-CSI driver off the helm chart in a k3s cluster - only one node for now, I intend to grow it to a few more once I have my base config figured out. But, this means that this message:

kube-system   0s                     Warning   FailedScheduling                 Pod/csi-nfs-controller-59b87c6c7c-ktfh7    0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.

isn't helping a whole lot. So I have tried to get rid of this but no matter to what I set controller.tolerations, I keep getting that warning.

First, here's my HelmChart and values as kubectl applyd to the k3s node:

apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
  name: nfs-csi-chart
  namespace: kube-system
spec:
  repo: https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
  chart: csi-driver-nfs
  #version: latest
  targetNamespace: kube-system
  valuesContent: |-
    serviceAccount:
      create: true # When true, service accounts will be created for you. Set to false if you want to use your own.
      # controller: csi-nfs-controller-sa # Name of Service Account to be created or used
      # node: csi-nfs-node-sa # Name of Service Account to be created or used

    rbac:
      create: true
      name: nfs

    driver:
      name: nfs.csi.k8s.io
      mountPermissions: 0

    feature:
      enableFSGroupPolicy: true
      enableInlineVolume: false
      propagateHostMountOptions: false

    # do I have to change that?; k3s on /mnt/usb/k3s but no kubelet dir
    kubeletDir: /var/lib/kubelet

    controller:
      # TODO: do i need to true them?
      runOnControlPlane: true
      runOnMaster: true
      logLevel: 5
      workingMountDir: /tmp
      defaultOnDeletePolicy: retain  # available values: delete, retain
      priorityClassName: system-cluster-critical
      # FIXME: better solution???
      tolerations: []
    node:
      name: csi-nfs-node

    # TODO: sync to backup
    externalSnapshotter:
      enabled: false
      name: snapshot-controller
      priorityClassName: system-cluster-critical
      # Create volume snapshot CRDs.
      customResourceDefinitions:
        enabled: true   #if set true, VolumeSnapshot, VolumeSnapshotContent and VolumeSnapshotClass CRDs will be created. Set it false, If they already exist in cluster.

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-bunker
provisioner: nfs.csi.k8s.io
parameters:
  # alt. use tailscale IP
  server: 192.168.1.2
  share: /mnt/vol1/Services/k3s
reclaimPolicy: Retain
volumeBindingMode: Immediate
mountOptions:
  - nfsvers=4.1

When I look at the generated pod that throws the error, I can see the tolerations right then and there:

  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/controlplane
    operator: Exists
  - effect: NoSchedule
    key: node-role.kubernetes.io/control-plane
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300

Is there something I overlooked to make the controller properly schedule onto my node? Looking at the node itself shows the related taints:

Node spec
# kubectl get node/routerboi -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    alpha.kubernetes.io/provided-node-ip: 192.168.1.3
    csi.volume.kubernetes.io/nodeid: '{"nfs.csi.k8s.io":"routerboi"}'
    etcd.k3s.cattle.io/local-snapshots-timestamp: "2024-04-21T04:19:08+02:00"
    etcd.k3s.cattle.io/node-address: 192.168.1.3
    etcd.k3s.cattle.io/node-name: routerboi-a33ea14d
    flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"de:b0:64:00:55:cf"}'
    flannel.alpha.coreos.com/backend-type: vxlan
    flannel.alpha.coreos.com/kube-subnet-manager: "true"
    flannel.alpha.coreos.com/public-ip: 100.64.0.2
    flannel.alpha.coreos.com/public-ip-overwrite: 100.64.0.2
    k3s.io/encryption-config-hash: start-70fb6f5afe422f096fc74aa91ff0998185377373139914e3aeaa9d20999adf8f
    k3s.io/external-ip: 100.64.0.2
    k3s.io/hostname: cluserboi
    k3s.io/internal-ip: 192.168.1.3
    k3s.io/node-args: '["server","--log","/var/log/k3s.log","--token","********","--write-kubeconfig-mode","600","--cluster-init","true","--cluster-domain","kube.birb.it","--flannel-external-ip","true","--etcd-snapshot-compress","true","--secrets-encryption","true","--data-dir","/mnt/usb/k3s","--node-external-ip","100.64.0.2","--node-label","node-location=home","--node-name","routerboi","--default-local-storage-path","/mnt/usb/k3s-data"]'
    k3s.io/node-config-hash: 7FJHCLEHT5LLPFFY5MHTC4FNIGPUD3EZI2YWWAVNCRX4UCF2TZZA====
    k3s.io/node-env: '{"K3S_DATA_DIR":"/mnt/usb/k3s/data/7ddd49d3724e00d95d2af069d3247eaeb6635abe80397c8d94d4053dd02ab88d"}'
    node.alpha.kubernetes.io/ttl: "0"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2024-04-20T20:07:06Z"
  finalizers:
  - wrangler.cattle.io/node
  - wrangler.cattle.io/managed-etcd-controller
  labels:
    beta.kubernetes.io/arch: arm64
    beta.kubernetes.io/instance-type: k3s
    beta.kubernetes.io/os: linux
    kubernetes.io/arch: arm64
    kubernetes.io/hostname: routerboi
    kubernetes.io/os: linux
    node-location: home
    node-role.kubernetes.io/control-plane: "true"
    node-role.kubernetes.io/etcd: "true"
    node-role.kubernetes.io/master: "true"
    node.kubernetes.io/instance-type: k3s
  name: routerboi
  resourceVersion: "72651"
  uid: b4e6ff71-c631-4f20-a61f-ef578cf2749d
spec:
  podCIDR: 10.42.0.0/24
  podCIDRs:
  - 10.42.0.0/24
  providerID: k3s://routerboi
status:
  addresses:
  - address: 192.168.1.3
    type: InternalIP
  - address: 100.64.0.2
    type: ExternalIP
  - address: cluserboi
    type: Hostname
  allocatable:
    cpu: "8"
    ephemeral-storage: "28447967825"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    hugepages-32Mi: "0"
    hugepages-64Ki: "0"
    memory: 8131288Ki
    pods: "110"
  capacity:
    cpu: "8"
    ephemeral-storage: 29243388Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    hugepages-32Mi: "0"
    hugepages-64Ki: "0"
    memory: 8131288Ki
    pods: "110"
  conditions:
  - lastHeartbeatTime: "2024-04-21T03:12:33Z"
    lastTransitionTime: "2024-04-20T20:07:16Z"
    message: Node is a voting member of the etcd cluster
    reason: MemberNotLearner
    status: "True"
    type: EtcdIsVoter
  - lastHeartbeatTime: "2024-04-21T03:13:06Z"
    lastTransitionTime: "2024-04-20T20:07:06Z"
    message: kubelet has sufficient memory available
    reason: KubeletHasSufficientMemory
    status: "False"
    type: MemoryPressure
  - lastHeartbeatTime: "2024-04-21T03:13:06Z"
    lastTransitionTime: "2024-04-20T20:07:06Z"
    message: kubelet has no disk pressure
    reason: KubeletHasNoDiskPressure
    status: "False"
    type: DiskPressure
  - lastHeartbeatTime: "2024-04-21T03:13:06Z"
    lastTransitionTime: "2024-04-20T20:07:06Z"
    message: kubelet has sufficient PID available
    reason: KubeletHasSufficientPID
    status: "False"
    type: PIDPressure
  - lastHeartbeatTime: "2024-04-21T03:13:06Z"
    lastTransitionTime: "2024-04-20T22:19:01Z"
    message: kubelet is posting ready status. AppArmor enabled
    reason: KubeletReady
    status: "True"
    type: Ready
  daemonEndpoints:
    kubeletEndpoint:
      Port: 10250
  images:
  - names:
    - docker.io/rancher/klipper-helm@sha256:87db3ad354905e6d31e420476467aefcd8f37d071a8f1c8a904f4743162ae546
    - docker.io/rancher/klipper-helm:v0.8.3-build20240228
    sizeBytes: 84105730
  - names:
    - docker.io/vaultwarden/server@sha256:edb8e2bab9cbca22e555638294db9b3657ffbb6e5d149a29d7ccdb243e3c71e0
    - docker.io/vaultwarden/server:latest
    sizeBytes: 66190948
  - names:
    - registry.k8s.io/sig-storage/nfsplugin@sha256:54b97b7ec30ca185c16e8c40e84fc527a7fc5cc8e9f7ea6b857a7a67655fff54
    - registry.k8s.io/sig-storage/nfsplugin:v4.6.0
    sizeBytes: 63690685
  - names:
    - docker.io/rancher/mirrored-library-traefik@sha256:ca9c8fbe001070c546a75184e3fd7f08c3e47dfc1e89bff6fe2edd302accfaec
    - docker.io/rancher/mirrored-library-traefik:2.10.5
    sizeBytes: 40129288
  - names:
    - docker.io/rancher/mirrored-metrics-server@sha256:20b8b36f8cac9e25aa2a0ff35147b13643bfec603e7e7480886632330a3bbc59
    - docker.io/rancher/mirrored-metrics-server:v0.7.0
    sizeBytes: 17809919
  - names:
    - docker.io/rancher/local-path-provisioner@sha256:aee53cadc62bd023911e7f077877d047c5b3c269f9bba25724d558654f43cea0
    - docker.io/rancher/local-path-provisioner:v0.0.26
    sizeBytes: 15933947
  - names:
    - docker.io/rancher/mirrored-coredns-coredns@sha256:a11fafae1f8037cbbd66c5afa40ba2423936b72b4fd50a7034a7e8b955163594
    - docker.io/rancher/mirrored-coredns-coredns:1.10.1
    sizeBytes: 14556850
  - names:
    - registry.k8s.io/sig-storage/livenessprobe@sha256:5baeb4a6d7d517434292758928bb33efc6397368cbb48c8a4cf29496abf4e987
    - registry.k8s.io/sig-storage/livenessprobe:v2.12.0
    sizeBytes: 12635307
  - names:
    - registry.k8s.io/sig-storage/csi-node-driver-registrar@sha256:c53535af8a7f7e3164609838c4b191b42b2d81238d75c1b2a2b582ada62a9780
    - registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.10.0
    sizeBytes: 10291112
  - names:
    - docker.io/rancher/klipper-lb@sha256:558dcf96bf0800d9977ef46dca18411752618cd9dd06daeb99460c0a301d0a60
    - docker.io/rancher/klipper-lb:v0.4.7
    sizeBytes: 4939041
  - names:
    - docker.io/library/busybox@sha256:c3839dd800b9eb7603340509769c43e146a74c63dca3045a8e7dc8ee07e53966
    - docker.io/rancher/mirrored-library-busybox@sha256:0d2d5aa0a465e06264b1e68a78b6d2af5df564504bde485ae995f8e73430bca2
    - docker.io/library/busybox:latest
    - docker.io/rancher/mirrored-library-busybox:1.36.1
    sizeBytes: 1848702
  - names:
    - docker.io/rancher/mirrored-pause@sha256:74c4244427b7312c5b901fe0f67cbc53683d06f4f24c6faee65d4182bf0fa893
    - docker.io/rancher/mirrored-pause:3.6
    sizeBytes: 253243
  nodeInfo:
    architecture: arm64
    bootID: 198115b5-8292-4d8d-91ef-5faf2ea60504
    containerRuntimeVersion: containerd://1.7.11-k3s2
    kernelVersion: 6.8.7-edge-rockchip-rk3588
    kubeProxyVersion: v1.29.3+k3s1
    kubeletVersion: v1.29.3+k3s1
    machineID: 28b5d8681b21493b87f17ffeb6fcb5b7
    operatingSystem: linux
    osImage: Armbian 24.5.0-trunk.446 bookworm
    systemUUID: 28b5d8681b21493b87f17ffeb6fcb5b7

Do you perhaps see something that I missed?

Thank you and kind regards,
Ingwie

have you resolved this issue?

Ran into the same issue today when setting up version 4.7.0 on my k3s cluster.
I also had both controller.runOnMaster and controller.runOnControlPlane set to true.

When doing kubectl describe pod -l app=csi-nfs-controller this was the Node-Selectors part of it:

Node-Selectors:              kubernetes.io/os=linux
                             node-role.kubernetes.io/control-plane=
                             node-role.kubernetes.io/master=

Which seems to be the correct behavior according to the template.

However my master node has the following labels:

node-role.kubernetes.io/control-plane=true
node-role.kubernetes.io/master=true

Setting controller.runOnMaster and controller.runOnControlPlane to false and then specifying controller.nodeSelector manually like this works:

nodeSelector:
  node-role.kubernetes.io/control-plane: "true"
  node-role.kubernetes.io/master: "true"

There's already a PR #603 addressing this, but as was already mentioned in the comment there, using nodeSelector is not the best solution as master nodes may be labeled with either "true" or "".