karpenter disruption taint isn't correct for karpenter v1

Question

karpenter disruption taint isn't correct for karpenter v1

willthames opened this issue 4 months ago · 3 comments

/kind bug

What happened?

EBS CSI driver does not clean up after itself when node is terminating due to karpenter disruption after v1.0.1 karpenter upgrade

What you expected to happen?

EBS CSI driver removes all volume attachments when terminating due to node termination

How to reproduce it (as minimally and precisely as possible)?

Have an EBS volume attached, and don't configure any tolerations on the EBS CSI driver (so it gets terminated before the node terminates)

Anything else we need to know?:

The taint during karpenter disruption is:

  taints:
  - effect: NoSchedule
    key: karpenter.sh/disrupted

But the IsDisruptedTaint looks for disrupting rather than disrupted because it's using the v1beta1 API rather than the v1 API.

https://github.com/kubernetes-sigs/karpenter/blob/b69e975128ac9a511542f9f1d245a6d4c3f91112/pkg/apis/v1beta1/taints.go#L28

vs

https://github.com/kubernetes-sigs/karpenter/blob/b69e975128ac9a511542f9f1d245a6d4c3f91112/pkg/apis/v1/taints.go#L28

Environment

Kubernetes version (use kubectl version):

Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.30.3-eks-a18cd3a

Driver version: v1.35.0

Answer 1 · 2024-09-24T01:14:46.000Z

disrupting rather than disrupted because it's using the v1beta1 API rather than the v1 API.

Wow this is a very subtle breaking change. Thank you so much @willthames for getting back to me and finding an even deeper root cause. We'll take care of this before our next driver release, and try to find some mechanism to make sure this does not happen a third time.

Answer 2 · 2024-09-24T16:05:09.000Z

CSI drivers are not responsible for deleting VolumeAttachment objects. This is handled by Kubernetes itself, specifically the Attach/Detach controller in the KCM.

In Karpenter v1.0.0 and beyond, Karpenter ensures that EBS volumes are properly unmounted/detached (VolumeAttachment objects deleted) before the node is terminated. That functionality relies on either:

a) The CSI node tolerateAllTaints parameter being enabled (which is the default setting currently)
b) Manually configuring the toleration on the node pod to prevent Karpenter from evicting it prematurely.

For users who have disabled tolerateAllTaints and haven't added the necessary toleration to the node pod spec, the pre-stop lifecycle hook in the driver can still be helpful (once we start checking for the presence of that new taint). However, this should be viewed as a fallback, best-effort mechanism rather than the primary method of ensuring clean volume detachment.

Answer 3 · 2024-09-24T19:12:08.000Z

Manually configuring the toleration on the node pod to prevent Karpenter from evicting it prematurely.

I think a nice improvement here would be maintaining the list of common tolerations in our helm chart, that way the pre-stop hook wouldn't need to be relied upon as a fallback.

I don't think I would have caught this regression in driver cleanup due to a Karpenter upgrade myself either (it was a subtle line in the migration guide), which means we can alleviate some customer pain by making the driver 'just work' upon add-on creation.

This would also help us migrate off of tolerateAllTaints in the future if we decide to change that default as some maintainers have suggested.

That functionality relies on...

To be fair, we could do a better job here by documenting these assumptions, because volumes not unmounting during node drains might surprise non-subject-matter-experts in Kubernetes Storage. I'll add an FAQ item in the fix PR, but we should also consider adding a note in our install guide.

Either way, thank you @willthames for reporting this pain-point.