PV delete does not apply options and fails
diarmuidie opened this issue · 1 comments
What happened:
The NFS controller gets into a loop of trying to delete a PV but keeps failing because it isn't setting the nolock option, even though this option is set on the PV
What you expected to happen:
The PV should be deleted sucesfully
How to reproduce it:
I have a PV with the nolock
option set:
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: nfs.csi.k8s.io
volume.kubernetes.io/provisioner-deletion-secret-name: ""
volume.kubernetes.io/provisioner-deletion-secret-namespace: ""
creationTimestamp: "2023-10-19T09:44:21Z"
finalizers:
- kubernetes.io/pv-protection
name: 799874bf-0a13-495c-a34c-cf0f5b37813a
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 1953125Ki
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: kpack-build-yjevrbapzgjwz0dzbnz3ci2m-pvc
namespace: mynamespace
csi:
driver: nfs.csi.k8s.io
volumeAttributes:
csi.storage.k8s.io/pv/name: 799874bf-0a13-495c-a34c-cf0f5b37813a
csi.storage.k8s.io/pvc/name: kpack-build-yjevrbapzgjwz0dzbnz3ci2m-pvc
csi.storage.k8s.io/pvc/namespace: kpack-customer
server: 10.108.226.82
share: /mymount
storage.kubernetes.io/csiProvisionerIdentity: 1696336188172-6518-nfs.csi.k8s.io
subdir:799874bf-0a13-495c-a34c-cf0f5b37813a
volumeHandle: 10.10.10.10#mymount#799874bf-0a13-495c-a34c-cf0f5b37813a##
mountOptions:
- nolock
persistentVolumeReclaimPolicy: Delete
storageClassName: csi-nfs
volumeMode: Filesystem
status:
phase: Released
When the csi-nfs-controller
controller starts up it detects that the PV should be deleted:
csi-nfs-controller-75ff648cfd-tntc8 csi-provisioner I1027 07:59:45.345273 1 controller.go:1502] delete "799874bf-0a13-495c-a34c-cf0f5b37813a": started
However it quickly fails:
csi-nfs-controller-75ff648cfd-tntc8 csi-provisioner E1027 07:59:55.345798 1 controller.go:1512] delete "799874bf-0a13-495c-a34c-cf0f5b37813a": volume deletion failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
csi-nfs-controller-75ff648cfd-tntc8 csi-provisioner W1027 07:59:55.347356 1 controller.go:989] Retrying syncing volume "799874bf-0a13-495c-a34c-cf0f5b37813a", failure 0
csi-nfs-controller-75ff648cfd-tntc8 csi-provisioner E1027 07:59:55.347387 1 controller.go:1007] error syncing volume "799874bf-0a13-495c-a34c-cf0f5b37813a": rpc error: code = DeadlineExceeded desc = context deadline exceeded
csi-nfs-controller-75ff648cfd-tntc8 csi-provisioner I1027 07:59:55.347888 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"799874bf-0a13-495c-a34c-cf0f5b37813a", UID:"9ae4f7ee-00a8-43c4-a9b4-bd2a8f6fbc08", APIVersion:"v1", ResourceVersion:"823518166", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' rpc error: code = DeadlineExceeded desc = context deadline exceeded
csi-nfs-controller-75ff648cfd-tntc8 nfs Mounting command: mount
csi-nfs-controller-75ff648cfd-tntc8 nfs Mounting arguments: -t nfs 10.10.10.10:/mymount /tmp/799874bf-0a13-495c-a34c-cf0f5b37813a
csi-nfs-controller-75ff648cfd-tntc8 nfs Output: mount.nfs: rpc.statd is not running but is required for remote locking.
csi-nfs-controller-75ff648cfd-tntc8 nfs mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
csi-nfs-controller-75ff648cfd-tntc8 nfs
csi-nfs-controller-75ff648cfd-tntc8 nfs E1027 07:59:58.429802 1 utils.go:111] GRPC error: rpc error: code = Internal desc = failed to mount nfs server: rpc error: code = Internal desc = mount failed: exit status 32
csi-nfs-controller-75ff648cfd-tntc8 nfs Mounting command: mount
csi-nfs-controller-75ff648cfd-tntc8 nfs Mounting arguments: -t nfs 10.10.10.10:/mymount /tmp/799874bf-0a13-495c-a34c-cf0f5b37813a
csi-nfs-controller-75ff648cfd-tntc8 nfs Output: mount.nfs: rpc.statd is not running but is required for remote locking.
csi-nfs-controller-75ff648cfd-tntc8 nfs mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
csi-nfs-controller-75ff648cfd-tntc8 nfs E1027 07:59:58.528652 1 mount_linux.go:232] Mount failed: exit status 32
It looks like the nolock
mount option is not getting applied: Either use '-o nolock' to keep locks local, or start statd.
and because of this the delete is failing?
The controller tries to delete the PV a number of times and fails.
Anything else we need to know?:
The nolock
option is also set on the storageclass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: csi-nfs
mountOptions:
- nolock
parameters:
server: 10.10.10.10
share: /mymount
provisioner: nfs.csi.k8s.io
reclaimPolicy: Delete
volumeBindingMode: Immediate
I came across this as part of an investigation into why the controller suddenly started using a lot of CPU and Memory after working seamlessly for two months and it may or may not be related! This error appears over and over in the logs so the theory is that the controller is getting into a loop of trying and failing to delete a bunch of PVs.
Environment:
- CSI Driver version: v4.4.0
- Kubernetes version (use
kubectl version
): v1.27.4-gke.900 - OS (e.g. from /etc/os-release): cos-105-17412-156-4
- Kernel (e.g.
uname -a
): 5.15.120+ - Install tools:
- Others:
Closing as this looks to be covered in #260