kubernetes-sigs/sig-storage-lib-external-provisioner

Provisioner should not ignore AreadyExists error when saving PVs

jsafrane opened this issue · 8 comments

Related to kubernetes-csi/external-provisioner#340 - when two provisioners (or in-tree and external provisioner) provision a volume, the faster one succeeds and creates a PV. When the second provisioner finishes provisioning another volume, it tries to save PV with the same name and potentially different spec and it succeeds:

kubernetes-csi/external-provisioner#340

I0912 10:58:32.508150       1 volume_store.go:147] Saving volume pvc-fc52e9df-f1ad-480c-8530-d8ed03a0d96e
I0912 10:58:32.512051       1 volume_store.go:150] Volume pvc-fc52e9df-f1ad-480c-8530-d8ed03a0d96e saved

It should not succeed, the second provisioner should see the PV already exists and if necessary, it should delete the volume in storage backend. AlreadyExists error should not be quietly ignored!

The first obvious solution is to fix kubernetes-csi/external-provisioner#340. There should be always only one provisioner that provisions a PV.

But even when it's fixed, there may be some corner cases when two provisioners could run at the same time - for example, if we had an alpha feature in the external CSI provisioner for migration, it's hard to sync the migration flags with kube-controller-manager and I can image that both in-tree and external provisioners may provision the same PVC during upgrade / downgrade / cluster reconfiguration.

Therefore, we should fix this error properly.

@davidz627 @msau42 @bertinatto @ddebroy @wongma7

Brainstorming few solutions:

  1. After AlreadyExists, check the PV in API server and the one that we want to save. Ignore the error if both PVs have the same VolumeSource. Delete the unsaved volume if the VolumeSource is different.

  2. Or, each provisioner (meaning in-tree and csi-external-provisioner) saves PVs with different names (e.g. with pvc- and csi-pvc- prefix). Both PVs will be saved, Kubernetes will recognize that one of them is not necessary and initiate reclaim policy.

I like 2 better because it's simple ...

For 1, are there cases where comparison will say not equal even though the "volume ID" is the same. For example I see azure file refers to a 'secretname', what if this is different between in-tree and csi because they are using different credentials.

For 2), are there any potential backwards compatibility issues if we change the way they're named?

I don't think there are any backward compatibility issues, unless two provisioners are running at the same time and depend on the PV naming.

On the other hand, with Retain reclaim policy we may be piling up empty PVs that were never used.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.