PVCs pending with WaitForFirstConsumer on fresh install
jtackaberry opened this issue · 5 comments
Not sure if this is a bug report or a support request, but in any case I can't spot what's going awry.
Fresh install of microk8s 1.23 and csi-driver-lvm v0.4.1 via the Helm chart at https://github.com/metal-stack/helm-charts/tree/master/charts/csi-driver-lvm (which supports StorageClass
under storage.k8s.io/v1
).
# Deploy CSI driver
$ cat values.yaml
lvm:
devicePattern: /dev/sdb
rbac:
pspEnabled: false
$ helm upgrade --install --create-namespace -n storage -f values.yaml csi-driver-lvm ./helm-charts/charts/csi-driver-lvm/
# Storage classes created
$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
csi-driver-lvm-striped lvm.csi.metal-stack.io Delete WaitForFirstConsumer true 27m
csi-driver-lvm-mirror lvm.csi.metal-stack.io Delete WaitForFirstConsumer true 27m
csi-driver-lvm-linear (default) lvm.csi.metal-stack.io Delete WaitForFirstConsumer true 27m
# Create a test PVC
$ cat pvc-test.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test
namespace: default
spec:
storageClassName: csi-driver-lvm-linear
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "2Gi"
$ kubectl apply -f pvc-test.yaml
$ kubectl describe -n default pvc/test
Name: test
Namespace: default
StorageClass: csi-driver-lvm-linear
Status: Pending
Volume:
Labels: <none>
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 4s (x4 over 42s) persistentvolume-controller waiting for first consumer to be created before binding
The first sign of trouble comes from the plugin pod, where it raises a couple errors:
$ kubectl -n storage logs csi-driver-lvm-plugin-9bqb4 -c csi-driver-lvm-plugin
2022/02/05 20:02:01 unable to configure logging to stdout:no such flag -logtostderr
I0205 20:02:01.834133 1 lvm.go:108] pullpolicy: IfNotPresent
I0205 20:02:01.834139 1 lvm.go:112] Driver: lvm.csi.metal-stack.io
I0205 20:02:01.834142 1 lvm.go:113] Version: dev
I0205 20:02:01.873219 1 lvm.go:411] unable to list existing volumegroups:exit status 5
I0205 20:02:01.873250 1 nodeserver.go:51] volumegroup: csi-lvm not found
I0205 20:02:02.119070 1 nodeserver.go:58] unable to activate logical volumes: Volume group "csi-lvm" not found
Cannot process volume group csi-lvm
exit status 5
I0205 20:02:02.120111 1 controllerserver.go:259] Enabling controller service capability: CREATE_DELETE_VOLUME
I0205 20:02:02.120295 1 server.go:95] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
Over on the k8s node, /dev/sdb
does exist per lvm.devicePattern
:
$ blockdev --getsize64 /dev/sdb
32212254720
While the documentation doesn't say this is necessary, I didn't see any indication from the code that pvcreate
is called. So I figured perhaps that was the problem, and explicitly created it (which also demonstrates that the LVM command line tools are functional on the host):
# On k8s host
$ pvcreate /dev/sdb
Physical volume "/dev/sdb" successfully created.
# On client
$ kubectl -n storage rollout restart ds/csi-driver-lvm-plugin
No change. Still the Volume group "csi-lvm" not found"
errors from the plugin pod logs. Ok, this ostensibly shouldn't be necessary, but let's create it manually:
# On k8s host
$ vgcreate csi-lvm /dev/sdb
Volume group "csi-lvm" successfully created
$ vgs
VG #PV #LV #SN Attr VSize VFree
csi-lvm 1 0 0 wz--n- <30.00g <30.00g
# On client
$ kubectl -n storage rollout restart ds/csi-driver-lvm-plugin
This has addressed the errors from the plugin logs:
INFO: defaulting to container "csi-driver-lvm-plugin" (has: node-driver-registrar, csi-driver-lvm-plugin, liveness-probe)
2022/02/05 20:23:53 unable to configure logging to stdout:no such flag -logtostderr
I0205 20:23:53.656589 1 lvm.go:108] pullpolicy: IfNotPresent
I0205 20:23:53.656596 1 lvm.go:112] Driver: lvm.csi.metal-stack.io
I0205 20:23:53.656598 1 lvm.go:113] Version: dev
I0205 20:23:53.738596 1 controllerserver.go:259] Enabling controller service capability: CREATE_DELETE_VOLUME
I0205 20:23:53.738891 1 server.go:95] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
But that didn't fix the pending PVC, even after recreating it:
$ kubectl describe -n default pvc/test
Name: test
Namespace: default
StorageClass: csi-driver-lvm-linear
Status: Pending
Volume:
Labels: <none>
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 4s (x2 over 16s) persistentvolume-controller waiting for first consumer to be created before binding
Hopefully it's clear where things have gone wrong. :)
Thanks!
Hi, from the first view, all was done right.
PVCs must not created before, you can simply create a vg from a given block device or a list of block devices.
I guess your pod will mount the pvc when you delete it.
What OS is you worker node running ?
I guess your pod will mount the pvc when you delete it.
This is actually the revelation, and what's missing in my reproduction steps above: the PV isn't actually provisioned until a pod mounts the PVC. I tried creating the pod while the PVC is Pending, and things are working: the VG is created, the PV is provisioned and bound, and the pod starts.
I never got as far as creating a pod, because I figured what was the point if the PVC was stuck in pending state? Every other CSI driver I have experience with so far immediately provisions a PV and binds it when a PVC is created, so I'm embarrassed to say I never bothered creating a pod, because I was expecting csi-driver-lvm to work this way as well.
Can I humbly suggest this as an improvement? IMO it's surprising behavior to defer PV creation until after some pod mounts the PVC.
What OS is you worker node running ?
Apologies for not mentioning. Ubuntu 20.04.3.
No it cannot create the pv unless the pod is created, because this csi driver is a local-storage provider and therefor it is required to know on which node the pod get scheduled.
No it cannot create the pv unless the pod is created, because this csi driver is a local-storage provider and therefor it is required to know on which node the pod get scheduled.
Hah. You're completely right of course, I have no explanation for my momentary demonstration of stupidity. :)
Perhaps a quick note in the README might be helpful for the absentminded like me to remind us that local-storage providers will work differently than network-storage providers in this regard?
Thanks for your patience @majst01. Will close as this isn't a bug and I'm up and running.
No Problem.