Provisioner does not allow rescheduling if a Node is deleted after a pod is scheduled

Question

Provisioner does not allow rescheduling if a Node is deleted after a pod is scheduled

pwschuurman opened this issue 3 years ago · 19 comments

If a node is deleted while a pod is scheduled on a node (but before a claim is provisioned), a pod can become indefinitely stuck in a Pending state.

Typically when a failure occurs in provisioning, the provisioner will relinquish control back to the Scheduler, to reschedule the Pod somehwere else. This is done by removing the volume.kubernetes.io/selected-node annotation from the PVC. The controller returns ProvisioningFinished in provisionClaimOperation. This can happen in the case when storage cannot be scheduled on the selected node: https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go#L1420

However, if a Node becomes unavailable after it has been selected by the Scheduler, the provisioner will not remove this annotation, since it returns ProvisioningNoChange in provisionClaimOperation. This is potentially useful in some situations where there is eventual consistency for a Node to become available, once it has been selected. However, for the case when a Node is deleted, this is an unrecoverable condition, and requires the user to intervene (either by adding the exact node back (infeasible for dynamically provisioned node names), deleting/re-creating the pod and allowing the Scheduler to reschedule, or manually removing the selected-node annotation on the PVC).

amacaskill commented 2 years ago

/reopen

sunnylovestiramisu commented a year ago

/close

Answer 1 · 2022-05-31T18:28:20.000Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Answer 2 · 2022-06-30T19:24:36.000Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Answer 3 · 2022-07-30T19:34:16.000Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Answer 4 · 2022-07-30T19:34:26.000Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Answer 5 · 2022-10-13T17:39:34.000Z

/remove-lifecycle rotten

Answer 6 · 2022-10-13T17:40:01.000Z

@amacaskill: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Answer 7 · 2022-11-23T22:34:39.000Z

Repro using VolumeSnapshot to delay provisioning: https://gist.github.com/pwschuurman/fd9c8c50889ce2382bcdca259c51d3e4

Create a VolumeSnapshot that references a non-existent disk (or a disk that takes a lot of time to be copied in order for the VolumeSnapshot to become ready)
Create a PVC that references the VolumeSnapshot as a DataSource
Create a pod that references said PVC. Scheduler will select a node for the pod, and add the volume.kubernetes.io/selected-node annotation to the PVC.
While operation from (1) is still pending, delete the node that the PVC is selected for. This could happen under normal conditions due to node repair, upgrade, autoscaling.
Once the VolumeSnapshot becomes ready, the provisioner will start to emit failed to get target node. PVC must be deleted (or annotation removed) to fix this problem.

Some ideas on how to handle this:

Add a timeout that will remove the annotation after some period of time. If a volume.kubernetes.io/selected-node annotation becomes, stale remove it from the PVC. This is troublesome as some delays can take a long time (eg: waiting for snapshot to be created), and may not fit into a well define timeout period.
Update csi-provisioner to use an informer, rather than a lister. This would allow the provisioner to be aware of deletion events for a node, and remove the annotation for affected volumes. The provisioner would likely need to keep a cache of node -> volume, in order to update affected volumes.
Update the scheduler to keep remove the annotation on node deletion.

Answer 8 · 2023-02-21T23:17:45.000Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Answer 9 · 2023-02-22T00:27:04.000Z

/remove-lifecycle stale
/triage accepted

Answer 10 · 2023-02-22T00:28:04.000Z

I think this is the same issue as kubernetes/kubernetes#100485

Answer 11 · 2023-03-06T19:43:45.000Z

Another option we discussed is: remove the annotation when the provisioner tries to access a Node that doesn't exist by detecting errors.NewNotFound

Answer 12 · 2023-03-07T22:44:59.000Z

/assign @sunnylovestiramisu

Answer 13 · 2023-03-08T01:11:05.000Z

Reproduced the error by the following step:

kubetest --build --up
Deploy a pd csi driver via [gcp-compute-persistent-disk-csi-driver/deploy/kubernetes/deploy-driver.sh](https://goto.google.com/src)
Create a storage class, create a pvc with annotation: volume.kubernetes.io/selected-node, create a pod
PVC stayed in PENDING state
Check csi-provisioner logs via k logs -n gce-pd-csi-driver csi-gce-pd-controller-container csi-provisioner

W0308 00:51:37.588114       1 controller.go:934] Retrying syncing claim "xxxxxx", failure 12
E0308 00:51:37.588141       1 controller.go:957] error syncing claim "xxxxxx": failed to get target node: node "non-exist-node" not found
I0308 00:51:37.588381       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"task-pv-claim", UID:"xxxxxx", APIVersion:"v1", ResourceVersion:"4824", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to get target node: node "non-exist-node" not found

Answer 14 · 2023-03-09T22:58:10.000Z

Manually testes with the fix #139

Copy the sig-storage-lib-external-provisioner with the fix to external-attacher vendor
make container of a new external-attacher image
Upload to GCR and then replace the driver link in stable-master image.yaml
Spin up a k8s cluster on GCE via kubetest --build --up
Deploy a pd csi driver via [gcp-compute-persistent-disk-csi-driver/deploy/kubernetes/deploy-driver.sh
Create a storage class, create a pvc with annotation: volume.kubernetes.io/selected-node, create a pod
PVC in state "Successfully provisioned volume pvc-xxxxxx"

Answer 15 · 2023-03-21T20:50:58.000Z

We should cherry-pick to external-provisioner 3.2, 3.3, 3.4

Answer 16 · 2023-04-05T17:42:07.000Z

The release has been published in external-provisioner:

https://github.com/kubernetes-csi/external-provisioner/releases/tag/v3.3.1
https://github.com/kubernetes-csi/external-provisioner/releases/tag/v3.2.2
https://github.com/kubernetes-csi/external-provisioner/releases/tag/v3.4.1

Answer 17 · 2023-04-05T17:45:07.000Z

@sunnylovestiramisu: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.