karpenter disruption taint isn't correct for karpenter v1
willthames opened this issue · 3 comments
/kind bug
What happened?
EBS CSI driver does not clean up after itself when node is terminating due to karpenter disruption after v1.0.1 karpenter upgrade
What you expected to happen?
EBS CSI driver removes all volume attachments when terminating due to node termination
How to reproduce it (as minimally and precisely as possible)?
Have an EBS volume attached, and don't configure any tolerations on the EBS CSI driver (so it gets terminated before the node terminates)
Anything else we need to know?:
The taint during karpenter disruption is:
taints:
- effect: NoSchedule
key: karpenter.sh/disrupted
But the IsDisruptedTaint
looks for disrupting
rather than disrupted
because it's using the v1beta1 API rather than the v1 API.
vs
Environment
- Kubernetes version (use
kubectl version
):
Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.30.3-eks-a18cd3a
- Driver version: v1.35.0
disrupting
rather thandisrupted
because it's using the v1beta1 API rather than the v1 API.
Wow this is a very subtle breaking change. Thank you so much @willthames for getting back to me and finding an even deeper root cause. We'll take care of this before our next driver release, and try to find some mechanism to make sure this does not happen a third time.
CSI drivers are not responsible for deleting VolumeAttachment
objects. This is handled by Kubernetes itself, specifically the Attach/Detach controller in the KCM.
In Karpenter v1.0.0 and beyond, Karpenter ensures that EBS volumes are properly unmounted/detached (VolumeAttachment
objects deleted) before the node is terminated. That functionality relies on either:
a) The CSI node tolerateAllTaints
parameter being enabled (which is the default setting currently)
b) Manually configuring the toleration on the node pod to prevent Karpenter from evicting it prematurely.
For users who have disabled tolerateAllTaints
and haven't added the necessary toleration to the node pod spec, the pre-stop lifecycle hook in the driver can still be helpful (once we start checking for the presence of that new taint). However, this should be viewed as a fallback, best-effort mechanism rather than the primary method of ensuring clean volume detachment.
Manually configuring the toleration on the node pod to prevent Karpenter from evicting it prematurely.
I think a nice improvement here would be maintaining the list of common tolerations in our helm chart, that way the pre-stop hook wouldn't need to be relied upon as a fallback.
I don't think I would have caught this regression in driver cleanup due to a Karpenter upgrade myself either (it was a subtle line in the migration guide), which means we can alleviate some customer pain by making the driver 'just work' upon add-on creation.
This would also help us migrate off of tolerateAllTaints
in the future if we decide to change that default as some maintainers have suggested.
That functionality relies on...
To be fair, we could do a better job here by documenting these assumptions, because volumes not unmounting during node drains might surprise non-subject-matter-experts in Kubernetes Storage. I'll add an FAQ item in the fix PR, but we should also consider adding a note in our install guide.
Either way, thank you @willthames for reporting this pain-point.