Karpenter 0.37 Upgrade: Generic Ephemeral Volumes Not Deleting After Pod Removal Without Enabling Webhook
Opened this issue · 1 comments
Description
Observed Behavior:
We use Generic Ephemeral Volume in one of our use case and the lifecycle of these volumes follows the lifecycle of the pod. Until Karpenter 0.36.x, those volumes(PVCs/PVs) getting deleted automatically as soon as pod is deleted.
We upgraded to Karpenter 0.37.2, that has a webhook which is disabled by default. That breaks some of the functionality due to both v1 and v1beta1 APIs and we were unable to directly use kubectl get nodepool|nodeclaims|ec2nodeclasses
without api suffix. But that did not break any server side functionality until we noticed that we have hundreds of EBS volumes in available state, meaning Pods using those volumes are already gone but underlying PVCs and volumes still lying around. Earlier, that was not the case.
Further investigation showed, only recent change in cluster was Karpenter and a spike in PVCs in Grafana dashboard post upgrading the Karpenter.
Due to other CRD and webhook issues in Karpenter chart related to #6847 and #6867, there is no direct way we could use Flux to change default namespace hardcoded in CRDs in main chart.
Workaround: We had to manually update the CRDs in cluster and then enabled the webhook which is enabled by default in recent 0.37.3 chart version. After that we stopped observing the issue in our cluster and PVCs remained at steady state.
Expected Behavior:
Upgrading to chart 0.37.2 without enabling the webhook should still work and PVCs created thru generic ephemeral should be cleaned up automatically as expected and as it was working with 0.36.x version.
Question: How enabling webhook or in general, Karpenter is involved in deleting those PVCs or PVs. My understanding is Karpenter works with scheduler and is not involved in direct creation or deletion of PVCs , is there anything Karpenter started doing thru webhook which blocks PVCs deletion.
Reproduction Steps (Please include YAML):
- Upgrade to Karpenter chart 0.37.2 and don't enable webhook.
- Create a sample pod using manifest from https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes.
- A new pod
my-app
and a PVCmy-app-scratch-volume
will be created. - Run
kubectl get pvc my-app-scratch-volume -oyaml
to see the ownerReference, it would be something like below:
ownerReferences:
- apiVersion: v1
blockOwnerDeletion: true
controller: true
kind: Pod
name: my-app
- Just delete the pod ,
kubectl delete pod my-app
- Observe PVC created in step 2 will still be available and not cleaned up after pod deletion.
kubectl get pvc
- Check AWS EC2 Console for EBS volume created for above PVC. The volume id can be fetched from below steps. Volume will be in
available
state and free to be deleted.
1. kubectl describe pvc my-app-scratch-volume | grep -i volume:
2. kubectl describe pv <pv name from above step> | grep -i VolumeHandle:
- Only if we delete pvc manually, the underlying EBS volume gets deleted.
- If we enable webhook and update CRDs to override the default namespace for Karpenter, everything comes back to normal.
Versions:
- Chart Version: 0.37.2
- Kubernetes Version (
kubectl version
): 1.29.0
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Is there anything else I can provide or any more information needed to get some insights here? Thanks