PV stuck in status "Terminating" after deleting corresponding PVC
Closed this issue · 2 comments
What happened:
PV stuck in status "Terminating" after deleting corresponding PVC(reclaimPolicy
of StorageClass
is set to Delete
).
What you expected to happen:
Automatically created PV should also be deleted after corresponding PVC is deleted.
How to reproduce it:
Create a storage class:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-csi
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: nfs.csi.k8s.io
reclaimPolicy: Delete
volumeBindingMode: Immediate
parameters:
server: nas.example.com
share: /nds/testgrid/maps/storage
Run an app that will create PVC such as:
helm install gitea gitea-charts/gitea --values values_gitea.yaml
Then the following PVCs and PVs are created:
jiangzhenbing@prd37:~$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-gitea-postgresql-0 Bound pvc-faf4c83f-2b6e-4bfe-8a9d-cc2ac2ddecab 10Gi RWO nfs-csi 2m4s
gitea-shared-storage Bound pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a 10Gi RWO nfs-csi 2m5s
redis-data-gitea-redis-master-0 Bound pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a 8Gi RWO nfs-csi 2m4s
jiangzhenbing@prd37:~$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a 10Gi RWO Delete Bound default/gitea-shared-storage nfs-csi 2m1s
pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a 8Gi RWO Delete Bound default/redis-data-gitea-redis-master-0 nfs-csi 119s
pvc-faf4c83f-2b6e-4bfe-8a9d-cc2ac2ddecab 10Gi RWO Delete Bound default/data-gitea-postgresql-0 nfs-csi 119s
Then uninstall this app and delete PVC manually:
jiangzhenbing@prd37:~$ kubectl delete pvc data-gitea-postgresql-0 gitea-shared-storage redis-data-gitea-redis-master-0
persistentvolumeclaim "data-gitea-postgresql-0" deleted
persistentvolumeclaim "gitea-shared-storage" deleted
persistentvolumeclaim "redis-data-gitea-redis-master-0" deleted
PV will stuck in Terminating status:
jiangzhenbing@prd37:~$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a 10Gi RWO Delete Terminating default/gitea-shared-storage nfs-csi 4m46s
pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a 8Gi RWO Delete Terminating default/redis-data-gitea-redis-master-0 nfs-csi 4m44s
pvc-faf4c83f-2b6e-4bfe-8a9d-cc2ac2ddecab 10Gi RWO Delete Terminating default/data-gitea-postgresql-0 nfs-csi 4m44s
Check the log of csi-nfs-controller
:
E0724 06:58:35.690387 1 controller.go:1025] error syncing volume "pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a": persistentvolumes "pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a" is forbidden: User "system:serviceaccount:kube-system:csi-nfs-controller-sa" cannot patch resource "persistentvolumes" in API group "" at the cluster scope
I0724 06:58:35.967933 1 controller.go:1599] "Failed to remove finalizer for persistentvolume" PV="pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a" err="persistentvolumes \"pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a\" is forbidden: User \"system:serviceaccount:kube-system:csi-nfs-controller-sa\" cannot patch resource \"persistentvolumes\" in API group \"\" at the cluster scope"
I0724 06:58:35.967978 1 controller.go:1007] "Retrying syncing volume" key="pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a" failures=10
E0724 06:58:35.968014 1 controller.go:1025] error syncing volume "pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a": persistentvolumes "pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a" is forbidden: User "system:serviceaccount:kube-system:csi-nfs-controller-sa" cannot patch resource "persistentvolumes" in API group "" at the cluster scope
ClusterRole nfs-external-provisioner-role indeed has no permission Patch
on persistentvolumes
.
https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/deploy/rbac-csi-nfs.yaml
Anything else we need to know?:
Environment:
- CSI Driver version:
jiangzhenbing@prd37:~$ kubectl get po -n kube-system -o yaml | grep gcr | grep nfs
image: gcr.io/k8s-staging-sig-storage/nfsplugin:canary
image: gcr.io/k8s-staging-sig-storage/nfsplugin:canary
imageID: gcr.io/k8s-staging-sig-storage/nfsplugin@sha256:47d6a505dd9358ffcb865a4bb9e562b10cdd3645fbcdca7bbe5cce50af034c6a
image: gcr.io/k8s-staging-sig-storage/nfsplugin:canary
image: gcr.io/k8s-staging-sig-storage/nfsplugin:canary
imageID: gcr.io/k8s-staging-sig-storage/nfsplugin@sha256:47d6a505dd9358ffcb865a4bb9e562b10cdd3645fbcdca7bbe5cce50af034c6a
image: gcr.io/k8s-staging-sig-storage/nfsplugin:canary
image: gcr.io/k8s-staging-sig-storage/nfsplugin:canary
imageID: gcr.io/k8s-staging-sig-storage/nfsplugin@sha256:47d6a505dd9358ffcb865a4bb9e562b10cdd3645fbcdca7bbe5cce50af034c6a
- Kubernetes version (use
kubectl version
):v1.24.3
- OS (e.g. from /etc/os-release):
Ubuntu 22.04
- Kernel (e.g.
uname -a
):Linux prd37 5.19.0-32-generic #33~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Jan 30 17:03:34 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
- Install tools: kubectl apply
- Others:
@matschen
it's related to kubernetes-csi/external-provisioner#1235 (comment), could you try the workaround mentioned in that github issue I provided, thx
disable HonorPVReclaimPolicy
feature gate in csi-provisioner should fix the issue.
@andyzhangx
The command kubectl patch pv <pv-name> -p '{"metadata":{"finalizers":null}}'
set finalizer to null so the PV can be successfully deleted. But it's not the same issue as in kubernetes-csi/external-provisioner#1235
The key point of this issue is: ServiceAccount csi-nfs-controller-sa which bound to ClusterRole nfs-external-provisioner-role have no verb patch
, so the external-provisioner cannot patch the finalizer after volume delete, then PV stuck in Terminating
state.
E0724 06:58:35.690387 1 controller.go:1025] error syncing volume "pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a": persistentvolumes "pvc-a8f4852f-70b2-4f48-ae2f-6c81862dda2a" is forbidden: User "system:serviceaccount:kube-system:csi-nfs-controller-sa" cannot patch resource "persistentvolumes" in API group "" at the cluster scope
I0724 06:58:35.967933 1 controller.go:1599] "Failed to remove finalizer for persistentvolume" PV="pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a" err="persistentvolumes \"pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a\" is forbidden: User \"system:serviceaccount:kube-system:csi-nfs-controller-sa\" cannot patch resource \"persistentvolumes\" in API group \"\" at the cluster scope"
I0724 06:58:35.967978 1 controller.go:1007] "Retrying syncing volume" key="pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a" failures=10
E0724 06:58:35.968014 1 controller.go:1025] error syncing volume "pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a": persistentvolumes "pvc-a4b2a0da-2664-4336-b5e7-6e2271e21f8a" is forbidden: User "system:serviceaccount:kube-system:csi-nfs-controller-sa" cannot patch resource "persistentvolumes" in API group "" at the cluster scope
I noticed that patch
is added to ClusterRole external-provisioner-runner
in repo external-provisioner by commit kubernetes-csi/external-provisioner@c597852.