kubernetes-csi/csi-driver-nfs

PV stuck in status "Terminating" after deleting corresponding PVC

Closed this issue · 2 comments

What happened:
PV stuck in status "Terminating" after deleting corresponding PVC(reclaimPolicy of StorageClass is set to Delete).

What you expected to happen:
Automatically created PV should also be deleted after corresponding PVC is deleted.

How to reproduce it:
Create a storage class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-csi
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: nfs.csi.k8s.io
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
  server: nas.example.com
  share: /nds/testgrid/maps/storage

Run an app that will create PVC such as:

helm install gitea gitea-charts/gitea --values values_gitea.yaml

Then the following PVCs and PVs are created:

jiangzhenbing@prd37:~$ kubectl get pvc
NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-gitea-postgresql-0           Bound    pvc-faf4c83f-2b6e-4bfe-8a9d-cc2ac2ddecab   10Gi       RWO            nfs-csi        2m4s
gitea-shared-storage              Bound    pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a   10Gi       RWO            nfs-csi        2m5s
redis-data-gitea-redis-master-0   Bound    pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a   8Gi        RWO            nfs-csi        2m4s
jiangzhenbing@prd37:~$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                     STORAGECLASS   REASON   AGE
pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a   10Gi       RWO            Delete           Bound    default/gitea-shared-storage              nfs-csi                 2m1s
pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a   8Gi        RWO            Delete           Bound    default/redis-data-gitea-redis-master-0   nfs-csi                 119s
pvc-faf4c83f-2b6e-4bfe-8a9d-cc2ac2ddecab   10Gi       RWO            Delete           Bound    default/data-gitea-postgresql-0           nfs-csi                 119s

Then uninstall this app and delete PVC manually:

jiangzhenbing@prd37:~$ kubectl delete pvc data-gitea-postgresql-0 gitea-shared-storage redis-data-gitea-redis-master-0
persistentvolumeclaim "data-gitea-postgresql-0" deleted
persistentvolumeclaim "gitea-shared-storage" deleted
persistentvolumeclaim "redis-data-gitea-redis-master-0" deleted

PV will stuck in Terminating status:

jiangzhenbing@prd37:~$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS        CLAIM                                     STORAGECLASS   REASON   AGE
pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a   10Gi       RWO            Delete           Terminating   default/gitea-shared-storage              nfs-csi                 4m46s
pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a   8Gi        RWO            Delete           Terminating   default/redis-data-gitea-redis-master-0   nfs-csi                 4m44s
pvc-faf4c83f-2b6e-4bfe-8a9d-cc2ac2ddecab   10Gi       RWO            Delete           Terminating   default/data-gitea-postgresql-0           nfs-csi                 4m44s

Check the log of csi-nfs-controller:

E0724 06:58:35.690387       1 controller.go:1025] error syncing volume "pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a": persistentvolumes "pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a" is forbidden: User "system:serviceaccount:kube-system:csi-nfs-controller-sa" cannot patch resource "persistentvolumes" in API group "" at the cluster scope
I0724 06:58:35.967933       1 controller.go:1599] "Failed to remove finalizer for persistentvolume" PV="pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a" err="persistentvolumes \"pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a\" is forbidden: User \"system:serviceaccount:kube-system:csi-nfs-controller-sa\" cannot patch resource \"persistentvolumes\" in API group \"\" at the cluster scope"
I0724 06:58:35.967978       1 controller.go:1007] "Retrying syncing volume" key="pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a" failures=10
E0724 06:58:35.968014       1 controller.go:1025] error syncing volume "pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a": persistentvolumes "pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a" is forbidden: User "system:serviceaccount:kube-system:csi-nfs-controller-sa" cannot patch resource "persistentvolumes" in API group "" at the cluster scope

ClusterRole nfs-external-provisioner-role indeed has no permission Patch on persistentvolumes.
https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/deploy/rbac-csi-nfs.yaml
Anything else we need to know?:

Environment:

  • CSI Driver version:
jiangzhenbing@prd37:~$ kubectl get po -n kube-system -o yaml | grep gcr | grep nfs
      image: gcr.io/k8s-staging-sig-storage/nfsplugin:canary
      image: gcr.io/k8s-staging-sig-storage/nfsplugin:canary
      imageID: gcr.io/k8s-staging-sig-storage/nfsplugin@sha256:47d6a505dd9358ffcb865a4bb9e562b10cdd3645fbcdca7bbe5cce50af034c6a
      image: gcr.io/k8s-staging-sig-storage/nfsplugin:canary
      image: gcr.io/k8s-staging-sig-storage/nfsplugin:canary
      imageID: gcr.io/k8s-staging-sig-storage/nfsplugin@sha256:47d6a505dd9358ffcb865a4bb9e562b10cdd3645fbcdca7bbe5cce50af034c6a
      image: gcr.io/k8s-staging-sig-storage/nfsplugin:canary
      image: gcr.io/k8s-staging-sig-storage/nfsplugin:canary
      imageID: gcr.io/k8s-staging-sig-storage/nfsplugin@sha256:47d6a505dd9358ffcb865a4bb9e562b10cdd3645fbcdca7bbe5cce50af034c6a
  • Kubernetes version (use kubectl version): v1.24.3
  • OS (e.g. from /etc/os-release): Ubuntu 22.04
  • Kernel (e.g. uname -a): Linux prd37 5.19.0-32-generic #33~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Jan 30 17:03:34 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: kubectl apply
  • Others:

@matschen
it's related to kubernetes-csi/external-provisioner#1235 (comment), could you try the workaround mentioned in that github issue I provided, thx

disable HonorPVReclaimPolicy feature gate in csi-provisioner should fix the issue.

@andyzhangx
The command kubectl patch pv <pv-name> -p '{"metadata":{"finalizers":null}}' set finalizer to null so the PV can be successfully deleted. But it's not the same issue as in kubernetes-csi/external-provisioner#1235
The key point of this issue is: ServiceAccount csi-nfs-controller-sa which bound to ClusterRole nfs-external-provisioner-role have no verb patch, so the external-provisioner cannot patch the finalizer after volume delete, then PV stuck in Terminating state.

E0724 06:58:35.690387       1 controller.go:1025] error syncing volume "pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a": persistentvolumes "pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a" is forbidden: User "system:serviceaccount:kube-system:csi-nfs-controller-sa" cannot patch resource "persistentvolumes" in API group "" at the cluster scope
I0724 06:58:35.967933       1 controller.go:1599] "Failed to remove finalizer for persistentvolume" PV="pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a" err="persistentvolumes \"pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a\" is forbidden: User \"system:serviceaccount:kube-system:csi-nfs-controller-sa\" cannot patch resource \"persistentvolumes\" in API group \"\" at the cluster scope"
I0724 06:58:35.967978       1 controller.go:1007] "Retrying syncing volume" key="pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a" failures=10
E0724 06:58:35.968014       1 controller.go:1025] error syncing volume "pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a": persistentvolumes "pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a" is forbidden: User "system:serviceaccount:kube-system:csi-nfs-controller-sa" cannot patch resource "persistentvolumes" in API group "" at the cluster scope

I noticed that patch is added to ClusterRole external-provisioner-runner in repo external-provisioner by commit kubernetes-csi/external-provisioner@c597852.