kubernetes-sigs/sig-storage-lib-external-provisioner

PV stuck in released/terminating state even if CSI DeleteVolume action has been executed successfully

k2huang opened this issue · 7 comments

In our K8s cluster, in order to PV couldn't disappear before volume is successfully deleted via csi plugin, we enable AddFinalizer option for ProvisionController.

But we encounter many terminating PV objects even if CSI DeleteVolume action has been executed successfully. Error logs as following:

delete "pvc-9041b405-1a2d-42df-9f56-876e4f0217fd": failed to remove finalizer for persistentvolume: Operation cannot be fulfilled on persistentvolumes "pvc-9041b405-1a2d-42df-9f56-876e4f0217fd": the object has been modified; please apply your changes to the latest version and try again

After reading source code, I find some hints:
ProvisionController will delete PV Object after successfully calling CSI DeleteVolume, and then start to remove finalizer(external-provisioner.volume.kubernetes.io/finalizer).
But if the finalizer failed to be removed as above logs showing, PV will stuck in released/terminating state forever because PV's DeletionTimestamp is not nil now

	if ctrl.kubeVersion.AtLeast(utilversion.MustParseSemantic("v1.9.0")) {
		if ctrl.addFinalizer && !ctrl.checkFinalizer(volume, finalizerPV) && volume.ObjectMeta.DeletionTimestamp != nil {
			return false
		} else if volume.ObjectMeta.DeletionTimestamp != nil {
			return false
		}
	}

From my POV, it's better to trigger to delete PV object after successfully remove finalizer.

/assign

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.