Single-node (WIP) cluster can't schedule controller
IngwiePhoenix opened this issue · 6 comments
(Yep, I did read the template; but for some odd reason I am not seing the signup verification email. I am pretty sure it's a layer 8 problem... so, apologies in advance!)
Hello! I am trying to bootstrap the NFS-CSI driver off the helm chart in a k3s cluster - only one node for now, I intend to grow it to a few more once I have my base config figured out. But, this means that this message:
kube-system 0s Warning FailedScheduling Pod/csi-nfs-controller-59b87c6c7c-ktfh7 0/1 nodes are available: 1 node(s) didn't match Pod's node affinity/selector. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.
isn't helping a whole lot. So I have tried to get rid of this but no matter to what I set controller.tolerations
, I keep getting that warning.
First, here's my HelmChart and values as kubectl apply
d to the k3s node:
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: nfs-csi-chart
namespace: kube-system
spec:
repo: https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
chart: csi-driver-nfs
#version: latest
targetNamespace: kube-system
valuesContent: |-
serviceAccount:
create: true # When true, service accounts will be created for you. Set to false if you want to use your own.
# controller: csi-nfs-controller-sa # Name of Service Account to be created or used
# node: csi-nfs-node-sa # Name of Service Account to be created or used
rbac:
create: true
name: nfs
driver:
name: nfs.csi.k8s.io
mountPermissions: 0
feature:
enableFSGroupPolicy: true
enableInlineVolume: false
propagateHostMountOptions: false
# do I have to change that?; k3s on /mnt/usb/k3s but no kubelet dir
kubeletDir: /var/lib/kubelet
controller:
# TODO: do i need to true them?
runOnControlPlane: true
runOnMaster: true
logLevel: 5
workingMountDir: /tmp
defaultOnDeletePolicy: retain # available values: delete, retain
priorityClassName: system-cluster-critical
# FIXME: better solution???
tolerations: []
node:
name: csi-nfs-node
# TODO: sync to backup
externalSnapshotter:
enabled: false
name: snapshot-controller
priorityClassName: system-cluster-critical
# Create volume snapshot CRDs.
customResourceDefinitions:
enabled: true #if set true, VolumeSnapshot, VolumeSnapshotContent and VolumeSnapshotClass CRDs will be created. Set it false, If they already exist in cluster.
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-bunker
provisioner: nfs.csi.k8s.io
parameters:
# alt. use tailscale IP
server: 192.168.1.2
share: /mnt/vol1/Services/k3s
reclaimPolicy: Retain
volumeBindingMode: Immediate
mountOptions:
- nfsvers=4.1
When I look at the generated pod that throws the error, I can see the tolerations right then and there:
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/controlplane
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
Is there something I overlooked to make the controller properly schedule onto my node? Looking at the node itself shows the related taints:
Node spec
# kubectl get node/routerboi -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
alpha.kubernetes.io/provided-node-ip: 192.168.1.3
csi.volume.kubernetes.io/nodeid: '{"nfs.csi.k8s.io":"routerboi"}'
etcd.k3s.cattle.io/local-snapshots-timestamp: "2024-04-21T04:19:08+02:00"
etcd.k3s.cattle.io/node-address: 192.168.1.3
etcd.k3s.cattle.io/node-name: routerboi-a33ea14d
flannel.alpha.coreos.com/backend-data: '{"VNI":1,"VtepMAC":"de:b0:64:00:55:cf"}'
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: "true"
flannel.alpha.coreos.com/public-ip: 100.64.0.2
flannel.alpha.coreos.com/public-ip-overwrite: 100.64.0.2
k3s.io/encryption-config-hash: start-70fb6f5afe422f096fc74aa91ff0998185377373139914e3aeaa9d20999adf8f
k3s.io/external-ip: 100.64.0.2
k3s.io/hostname: cluserboi
k3s.io/internal-ip: 192.168.1.3
k3s.io/node-args: '["server","--log","/var/log/k3s.log","--token","********","--write-kubeconfig-mode","600","--cluster-init","true","--cluster-domain","kube.birb.it","--flannel-external-ip","true","--etcd-snapshot-compress","true","--secrets-encryption","true","--data-dir","/mnt/usb/k3s","--node-external-ip","100.64.0.2","--node-label","node-location=home","--node-name","routerboi","--default-local-storage-path","/mnt/usb/k3s-data"]'
k3s.io/node-config-hash: 7FJHCLEHT5LLPFFY5MHTC4FNIGPUD3EZI2YWWAVNCRX4UCF2TZZA====
k3s.io/node-env: '{"K3S_DATA_DIR":"/mnt/usb/k3s/data/7ddd49d3724e00d95d2af069d3247eaeb6635abe80397c8d94d4053dd02ab88d"}'
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2024-04-20T20:07:06Z"
finalizers:
- wrangler.cattle.io/node
- wrangler.cattle.io/managed-etcd-controller
labels:
beta.kubernetes.io/arch: arm64
beta.kubernetes.io/instance-type: k3s
beta.kubernetes.io/os: linux
kubernetes.io/arch: arm64
kubernetes.io/hostname: routerboi
kubernetes.io/os: linux
node-location: home
node-role.kubernetes.io/control-plane: "true"
node-role.kubernetes.io/etcd: "true"
node-role.kubernetes.io/master: "true"
node.kubernetes.io/instance-type: k3s
name: routerboi
resourceVersion: "72651"
uid: b4e6ff71-c631-4f20-a61f-ef578cf2749d
spec:
podCIDR: 10.42.0.0/24
podCIDRs:
- 10.42.0.0/24
providerID: k3s://routerboi
status:
addresses:
- address: 192.168.1.3
type: InternalIP
- address: 100.64.0.2
type: ExternalIP
- address: cluserboi
type: Hostname
allocatable:
cpu: "8"
ephemeral-storage: "28447967825"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
hugepages-32Mi: "0"
hugepages-64Ki: "0"
memory: 8131288Ki
pods: "110"
capacity:
cpu: "8"
ephemeral-storage: 29243388Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
hugepages-32Mi: "0"
hugepages-64Ki: "0"
memory: 8131288Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2024-04-21T03:12:33Z"
lastTransitionTime: "2024-04-20T20:07:16Z"
message: Node is a voting member of the etcd cluster
reason: MemberNotLearner
status: "True"
type: EtcdIsVoter
- lastHeartbeatTime: "2024-04-21T03:13:06Z"
lastTransitionTime: "2024-04-20T20:07:06Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2024-04-21T03:13:06Z"
lastTransitionTime: "2024-04-20T20:07:06Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2024-04-21T03:13:06Z"
lastTransitionTime: "2024-04-20T20:07:06Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2024-04-21T03:13:06Z"
lastTransitionTime: "2024-04-20T22:19:01Z"
message: kubelet is posting ready status. AppArmor enabled
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- docker.io/rancher/klipper-helm@sha256:87db3ad354905e6d31e420476467aefcd8f37d071a8f1c8a904f4743162ae546
- docker.io/rancher/klipper-helm:v0.8.3-build20240228
sizeBytes: 84105730
- names:
- docker.io/vaultwarden/server@sha256:edb8e2bab9cbca22e555638294db9b3657ffbb6e5d149a29d7ccdb243e3c71e0
- docker.io/vaultwarden/server:latest
sizeBytes: 66190948
- names:
- registry.k8s.io/sig-storage/nfsplugin@sha256:54b97b7ec30ca185c16e8c40e84fc527a7fc5cc8e9f7ea6b857a7a67655fff54
- registry.k8s.io/sig-storage/nfsplugin:v4.6.0
sizeBytes: 63690685
- names:
- docker.io/rancher/mirrored-library-traefik@sha256:ca9c8fbe001070c546a75184e3fd7f08c3e47dfc1e89bff6fe2edd302accfaec
- docker.io/rancher/mirrored-library-traefik:2.10.5
sizeBytes: 40129288
- names:
- docker.io/rancher/mirrored-metrics-server@sha256:20b8b36f8cac9e25aa2a0ff35147b13643bfec603e7e7480886632330a3bbc59
- docker.io/rancher/mirrored-metrics-server:v0.7.0
sizeBytes: 17809919
- names:
- docker.io/rancher/local-path-provisioner@sha256:aee53cadc62bd023911e7f077877d047c5b3c269f9bba25724d558654f43cea0
- docker.io/rancher/local-path-provisioner:v0.0.26
sizeBytes: 15933947
- names:
- docker.io/rancher/mirrored-coredns-coredns@sha256:a11fafae1f8037cbbd66c5afa40ba2423936b72b4fd50a7034a7e8b955163594
- docker.io/rancher/mirrored-coredns-coredns:1.10.1
sizeBytes: 14556850
- names:
- registry.k8s.io/sig-storage/livenessprobe@sha256:5baeb4a6d7d517434292758928bb33efc6397368cbb48c8a4cf29496abf4e987
- registry.k8s.io/sig-storage/livenessprobe:v2.12.0
sizeBytes: 12635307
- names:
- registry.k8s.io/sig-storage/csi-node-driver-registrar@sha256:c53535af8a7f7e3164609838c4b191b42b2d81238d75c1b2a2b582ada62a9780
- registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.10.0
sizeBytes: 10291112
- names:
- docker.io/rancher/klipper-lb@sha256:558dcf96bf0800d9977ef46dca18411752618cd9dd06daeb99460c0a301d0a60
- docker.io/rancher/klipper-lb:v0.4.7
sizeBytes: 4939041
- names:
- docker.io/library/busybox@sha256:c3839dd800b9eb7603340509769c43e146a74c63dca3045a8e7dc8ee07e53966
- docker.io/rancher/mirrored-library-busybox@sha256:0d2d5aa0a465e06264b1e68a78b6d2af5df564504bde485ae995f8e73430bca2
- docker.io/library/busybox:latest
- docker.io/rancher/mirrored-library-busybox:1.36.1
sizeBytes: 1848702
- names:
- docker.io/rancher/mirrored-pause@sha256:74c4244427b7312c5b901fe0f67cbc53683d06f4f24c6faee65d4182bf0fa893
- docker.io/rancher/mirrored-pause:3.6
sizeBytes: 253243
nodeInfo:
architecture: arm64
bootID: 198115b5-8292-4d8d-91ef-5faf2ea60504
containerRuntimeVersion: containerd://1.7.11-k3s2
kernelVersion: 6.8.7-edge-rockchip-rk3588
kubeProxyVersion: v1.29.3+k3s1
kubeletVersion: v1.29.3+k3s1
machineID: 28b5d8681b21493b87f17ffeb6fcb5b7
operatingSystem: linux
osImage: Armbian 24.5.0-trunk.446 bookworm
systemUUID: 28b5d8681b21493b87f17ffeb6fcb5b7
Do you perhaps see something that I missed?
Thank you and kind regards,
Ingwie
have you resolved this issue?
Ran into the same issue today when setting up version 4.7.0 on my k3s
cluster.
I also had both controller.runOnMaster
and controller.runOnControlPlane
set to true
.
When doing kubectl describe pod -l app=csi-nfs-controller
this was the Node-Selectors
part of it:
Node-Selectors: kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
node-role.kubernetes.io/master=
Which seems to be the correct behavior according to the template.
However my master node has the following labels:
node-role.kubernetes.io/control-plane=true
node-role.kubernetes.io/master=true
Setting controller.runOnMaster
and controller.runOnControlPlane
to false
and then specifying controller.nodeSelector
manually like this works:
nodeSelector:
node-role.kubernetes.io/control-plane: "true"
node-role.kubernetes.io/master: "true"
There's already a PR #603 addressing this, but as was already mentioned in the comment there, using nodeSelector
is not the best solution as master nodes may be labeled with either "true"
or ""
.