kubernetes-sigs/sig-storage-lib-external-provisioner

Provisioner does not allow rescheduling if a Node is deleted after a pod is scheduled

pwschuurman opened this issue · 19 comments

If a node is deleted while a pod is scheduled on a node (but before a claim is provisioned), a pod can become indefinitely stuck in a Pending state.

Typically when a failure occurs in provisioning, the provisioner will relinquish control back to the Scheduler, to reschedule the Pod somehwere else. This is done by removing the volume.kubernetes.io/selected-node annotation from the PVC. The controller returns ProvisioningFinished in provisionClaimOperation. This can happen in the case when storage cannot be scheduled on the selected node: https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/blob/master/controller/controller.go#L1420

However, if a Node becomes unavailable after it has been selected by the Scheduler, the provisioner will not remove this annotation, since it returns ProvisioningNoChange in provisionClaimOperation. This is potentially useful in some situations where there is eventual consistency for a Node to become available, once it has been selected. However, for the case when a Node is deleted, this is an unrecoverable condition, and requires the user to intervene (either by adding the exact node back (infeasible for dynamically provisioned node names), deleting/re-creating the pod and allowing the Scheduler to reschedule, or manually removing the selected-node annotation on the PVC).

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/remove-lifecycle rotten

/reopen

@amacaskill: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Repro using VolumeSnapshot to delay provisioning: https://gist.github.com/pwschuurman/fd9c8c50889ce2382bcdca259c51d3e4

  1. Create a VolumeSnapshot that references a non-existent disk (or a disk that takes a lot of time to be copied in order for the VolumeSnapshot to become ready)
  2. Create a PVC that references the VolumeSnapshot as a DataSource
  3. Create a pod that references said PVC. Scheduler will select a node for the pod, and add the volume.kubernetes.io/selected-node annotation to the PVC.
  4. While operation from (1) is still pending, delete the node that the PVC is selected for. This could happen under normal conditions due to node repair, upgrade, autoscaling.
  5. Once the VolumeSnapshot becomes ready, the provisioner will start to emit failed to get target node. PVC must be deleted (or annotation removed) to fix this problem.

Some ideas on how to handle this:

  1. Add a timeout that will remove the annotation after some period of time. If a volume.kubernetes.io/selected-node annotation becomes, stale remove it from the PVC. This is troublesome as some delays can take a long time (eg: waiting for snapshot to be created), and may not fit into a well define timeout period.
  2. Update csi-provisioner to use an informer, rather than a lister. This would allow the provisioner to be aware of deletion events for a node, and remove the annotation for affected volumes. The provisioner would likely need to keep a cache of node -> volume, in order to update affected volumes.
  3. Update the scheduler to keep remove the annotation on node deletion.

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

/remove-lifecycle stale
/triage accepted

I think this is the same issue as kubernetes/kubernetes#100485

Another option we discussed is: remove the annotation when the provisioner tries to access a Node that doesn't exist by detecting errors.NewNotFound

Reproduced the error by the following step:

  1. kubetest --build --up
  2. Deploy a pd csi driver via [gcp-compute-persistent-disk-csi-driver/deploy/kubernetes/deploy-driver.sh](https://goto.google.com/src)
  3. Create a storage class, create a pvc with annotation: volume.kubernetes.io/selected-node, create a pod
  4. PVC stayed in PENDING state
  5. Check csi-provisioner logs via k logs -n gce-pd-csi-driver csi-gce-pd-controller-container csi-provisioner
W0308 00:51:37.588114       1 controller.go:934] Retrying syncing claim "xxxxxx", failure 12
E0308 00:51:37.588141       1 controller.go:957] error syncing claim "xxxxxx": failed to get target node: node "non-exist-node" not found
I0308 00:51:37.588381       1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"task-pv-claim", UID:"xxxxxx", APIVersion:"v1", ResourceVersion:"4824", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to get target node: node "non-exist-node" not found

Manually testes with the fix #139

  1. Copy the sig-storage-lib-external-provisioner with the fix to external-attacher vendor
  2. make container of a new external-attacher image
  3. Upload to GCR and then replace the driver link in stable-master image.yaml
  4. Spin up a k8s cluster on GCE via kubetest --build --up
  5. Deploy a pd csi driver via [gcp-compute-persistent-disk-csi-driver/deploy/kubernetes/deploy-driver.sh
  6. Create a storage class, create a pvc with annotation: volume.kubernetes.io/selected-node, create a pod
  7. PVC in state "Successfully provisioned volume pvc-xxxxxx"

@sunnylovestiramisu: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.