kubernetes-csi/csi-driver-nfs

PV delete does not apply options and fails

diarmuidie opened this issue · 1 comments

What happened:
The NFS controller gets into a loop of trying to delete a PV but keeps failing because it isn't setting the nolock option, even though this option is set on the PV

What you expected to happen:
The PV should be deleted sucesfully

How to reproduce it:

I have a PV with the nolock option set:

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: nfs.csi.k8s.io
    volume.kubernetes.io/provisioner-deletion-secret-name: ""
    volume.kubernetes.io/provisioner-deletion-secret-namespace: ""
  creationTimestamp: "2023-10-19T09:44:21Z"
  finalizers:
  - kubernetes.io/pv-protection
  name: 799874bf-0a13-495c-a34c-cf0f5b37813a
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 1953125Ki
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: kpack-build-yjevrbapzgjwz0dzbnz3ci2m-pvc
    namespace: mynamespace
  csi:
    driver: nfs.csi.k8s.io
    volumeAttributes:
      csi.storage.k8s.io/pv/name: 799874bf-0a13-495c-a34c-cf0f5b37813a
      csi.storage.k8s.io/pvc/name: kpack-build-yjevrbapzgjwz0dzbnz3ci2m-pvc
      csi.storage.k8s.io/pvc/namespace: kpack-customer
      server: 10.108.226.82
      share: /mymount
      storage.kubernetes.io/csiProvisionerIdentity: 1696336188172-6518-nfs.csi.k8s.io
      subdir:799874bf-0a13-495c-a34c-cf0f5b37813a
    volumeHandle: 10.10.10.10#mymount#799874bf-0a13-495c-a34c-cf0f5b37813a##
  mountOptions:
  - nolock
  persistentVolumeReclaimPolicy: Delete
  storageClassName: csi-nfs
  volumeMode: Filesystem
status:
  phase: Released

When the csi-nfs-controller controller starts up it detects that the PV should be deleted:

csi-nfs-controller-75ff648cfd-tntc8 csi-provisioner I1027 07:59:45.345273       1 controller.go:1502] delete "799874bf-0a13-495c-a34c-cf0f5b37813a": started

However it quickly fails:

csi-nfs-controller-75ff648cfd-tntc8 csi-provisioner E1027 07:59:55.345798       1 controller.go:1512] delete "799874bf-0a13-495c-a34c-cf0f5b37813a": volume deletion failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
csi-nfs-controller-75ff648cfd-tntc8 csi-provisioner W1027 07:59:55.347356       1 controller.go:989] Retrying syncing volume "799874bf-0a13-495c-a34c-cf0f5b37813a", failure 0
csi-nfs-controller-75ff648cfd-tntc8 csi-provisioner E1027 07:59:55.347387       1 controller.go:1007] error syncing volume "799874bf-0a13-495c-a34c-cf0f5b37813a": rpc error: code = DeadlineExceeded desc = context deadline exceeded

csi-nfs-controller-75ff648cfd-tntc8 csi-provisioner I1027 07:59:55.347888       1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"799874bf-0a13-495c-a34c-cf0f5b37813a", UID:"9ae4f7ee-00a8-43c4-a9b4-bd2a8f6fbc08", APIVersion:"v1", ResourceVersion:"823518166", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' rpc error: code = DeadlineExceeded desc = context deadline exceeded

csi-nfs-controller-75ff648cfd-tntc8 nfs Mounting command: mount
csi-nfs-controller-75ff648cfd-tntc8 nfs Mounting arguments: -t nfs 10.10.10.10:/mymount /tmp/799874bf-0a13-495c-a34c-cf0f5b37813a
csi-nfs-controller-75ff648cfd-tntc8 nfs Output: mount.nfs: rpc.statd is not running but is required for remote locking.
csi-nfs-controller-75ff648cfd-tntc8 nfs mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
csi-nfs-controller-75ff648cfd-tntc8 nfs
csi-nfs-controller-75ff648cfd-tntc8 nfs E1027 07:59:58.429802       1 utils.go:111] GRPC error: rpc error: code = Internal desc = failed to mount nfs server: rpc error: code = Internal desc = mount failed: exit status 32
csi-nfs-controller-75ff648cfd-tntc8 nfs Mounting command: mount
csi-nfs-controller-75ff648cfd-tntc8 nfs Mounting arguments: -t nfs 10.10.10.10:/mymount /tmp/799874bf-0a13-495c-a34c-cf0f5b37813a
csi-nfs-controller-75ff648cfd-tntc8 nfs Output: mount.nfs: rpc.statd is not running but is required for remote locking.
csi-nfs-controller-75ff648cfd-tntc8 nfs mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
csi-nfs-controller-75ff648cfd-tntc8 nfs E1027 07:59:58.528652       1 mount_linux.go:232] Mount failed: exit status 32

It looks like the nolock mount option is not getting applied: Either use '-o nolock' to keep locks local, or start statd. and because of this the delete is failing?

The controller tries to delete the PV a number of times and fails.

Anything else we need to know?:

The nolock option is also set on the storageclass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: csi-nfs
mountOptions:
- nolock
parameters:
  server: 10.10.10.10
  share: /mymount
provisioner: nfs.csi.k8s.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

I came across this as part of an investigation into why the controller suddenly started using a lot of CPU and Memory after working seamlessly for two months and it may or may not be related! This error appears over and over in the logs so the theory is that the controller is getting into a loop of trying and failing to delete a bunch of PVs.

Screenshot 2023-10-27 at 09 22 45

Environment:

  • CSI Driver version: v4.4.0
  • Kubernetes version (use kubectl version): v1.27.4-gke.900
  • OS (e.g. from /etc/os-release): cos-105-17412-156-4
  • Kernel (e.g. uname -a): 5.15.120+
  • Install tools:
  • Others:

Closing as this looks to be covered in #260