foriequal0/pod-graceful-drain

Unable to upgrade cluster because of a drain error related to admission webhook denied request: no kind "Eviction" is registered for version

Closed this issue · 10 comments

Hi,

I've been using your package successfully for a few months now. Today I tried to upgrade my EKS cluster from 1.21 to 1.22 and I think this issue is related to pod graceful drain.

The worker nodes couldn't be drained because I got this type of error for a bunch of pods. I've included the exact error for all of the pods that were unable to be evicted which results in the node not being able to be drained / upgraded.

It's a test cluster so this is basically everything running on the cluster:

error when evicting pods/"argocd-redis-d486999b7-sgptn" -n "argocd": admission webhook "[mpodseviction.pod-graceful-drain.io](http://mpodseviction.pod-graceful-drain.io/)" denied the request: no kind "Eviction" is registered for version "policy/v1" in scheme "pkg/runtime/scheme.go:100"
error when evicting pods/"argocd-server-cb57f685d-22bng" -n "argocd": admission webhook "[mpodseviction.pod-graceful-drain.io](http://mpodseviction.pod-graceful-drain.io/)" denied the request: no kind "Eviction" is registered for version "policy/v1" in scheme "pkg/runtime/scheme.go:100"
error when evicting pods/"argocd-notifications-controller-5f8c5d6fc5-ldqlp" -n "argocd": admission webhook "[mpodseviction.pod-graceful-drain.io](http://mpodseviction.pod-graceful-drain.io/)" denied the request: no kind "Eviction" is registered for version "policy/v1" in scheme "pkg/runtime/scheme.go:100"
error when evicting pods/"coredns-85d5b4454c-dskk9" -n "kube-system": admission webhook "[mpodseviction.pod-graceful-drain.io](http://mpodseviction.pod-graceful-drain.io/)" denied the request: no kind "Eviction" is registered for version "policy/v1" in scheme "pkg/runtime/scheme.go:100"
error when evicting pods/"argocd-dex-server-64cb85bf46-pfbvx" -n "argocd": admission webhook "[mpodseviction.pod-graceful-drain.io](http://mpodseviction.pod-graceful-drain.io/)" denied the request: no kind "Eviction" is registered for version "policy/v1" in scheme "pkg/runtime/scheme.go:100"
error when evicting pods/"argocd-applicationset-controller-66689cbf4b-5k85t" -n "argocd": admission webhook "[mpodseviction.pod-graceful-drain.io](http://mpodseviction.pod-graceful-drain.io/)" denied the request: no kind "Eviction" is registered for version "policy/v1" in scheme "pkg/runtime/scheme.go:100"
error when evicting pods/"aws-load-balancer-controller-597f47c4df-mskv2" -n "kube-system": admission webhook "[mpodseviction.pod-graceful-drain.io](http://mpodseviction.pod-graceful-drain.io/)" denied the request: no kind "Eviction" is registered for version "policy/v1" in scheme "pkg/runtime/scheme.go:100"
error when evicting pods/"coredns-85d5b4454c-w9m7n" -n "kube-system": admission webhook "[mpodseviction.pod-graceful-drain.io](http://mpodseviction.pod-graceful-drain.io/)" denied the request: no kind "Eviction" is registered for version "policy/v1" in scheme "pkg/runtime/scheme.go:100"
error when evicting pods/"argocd-application-controller-0" -n "argocd": admission webhook "[mpodseviction.pod-graceful-drain.io](http://mpodseviction.pod-graceful-drain.io/)" denied the request: no kind "Eviction" is registered for version "policy/v1" in scheme "pkg/runtime/scheme.go:100"
error when evicting pods/"sealed-secrets-controller-5fb95c87fd-b25g8" -n "kube-system": admission webhook "[mpodseviction.pod-graceful-drain.io](http://mpodseviction.pod-graceful-drain.io/)" denied the request: no kind "Eviction" is registered for version "policy/v1" in scheme "pkg/runtime/scheme.go:100"
error when evicting pods/"argocd-repo-server-8576d68689-rsgww" -n "argocd": admission webhook "[mpodseviction.pod-graceful-drain.io](http://mpodseviction.pod-graceful-drain.io/)" denied the request: no kind "Eviction" is registered for version "policy/v1" in scheme "pkg/runtime/scheme.go:100"
error when evicting pods/"pod-graceful-drain-949674d56-stp7g" -n "kube-system": admission webhook "[mpodseviction.pod-graceful-drain.io](http://mpodseviction.pod-graceful-drain.io/)" denied the request: no kind "Eviction" is registered for version "policy/v1" in scheme "pkg/runtime/scheme.go:100"
error when evicting pods/"aws-load-balancer-controller-597f47c4df-ph64g" -n "kube-system": admission webhook "[mpodseviction.pod-graceful-drain.io](http://mpodseviction.pod-graceful-drain.io/)" denied the request: no kind "Eviction" is registered for version "policy/v1" in scheme "pkg/runtime/scheme.go:100"

Any tips on where to go from here?

At the time of writing, Eviction was policy/v1beta1, but it is changed to policy/v1 since 1.22

The pod/eviction subresource now accepts policy/v1 Eviction requests in addition to policy/v1beta1 Eviction requests (kubernetes/kubernetes#100724, @liggitt) [SIG API Machinery, Apps, Architecture, Auth, CLI, Storage and Testing]
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md#api-change-9

It might be related with around this https://github.com/foriequal0/pod-graceful-drain/blob/main/internal/pkg/webhooks/eviction_mutator.go#L66
However, I have a little experience with k8s api migration. It might take time to prepare a new version.

#32 Would this PR fix for this?

Any suggestions on how to install the version in the PR?

For example normally I'd Helm install it but without a release I'm not sure how to install it without Helm.

To test this locally you could:

  • Spin up a 1.21 cluster with managed worker nodes
  • Install the AWS Load Balancer Controller (2.4.1+)
  • Add a simple nginx deployment with an ingress
  • Add pod graceful drain

Then initiate a cluster upgrade to 1.22. The control plane will upgrade fine, it's the worker nodes that can't be upgraded.

I'm happy to do all of this testing but kind of blocked on how to upgrade my existing Helm installed version of pod graceful drain with the version in the PR.

Also I spun up a brand new test cluster with 1.21 and tested the 1.22 upgrade workflow without having pod graceful drain installed in the cluster. Everything got tainted, evicted and drained pretty quickly without errors.

There was 30 seconds of 504 related downtime on a test nginx deployment I had set up but I'm 90% sure that was due to not having pod graceful drain installed! I just wanted to confirm the error is fully isolated to this project (which it seems to be).

If you were feeling semi-confident in this patch, if it's not possible to Helm install something from a branch perhaps you could cut a beta release, like 0.0.8beta?

Okay. I've relased v0.0.8-beta.1 here. Chart version is v0.0.10-beta.1

Thanks for making this release so quickly!

The short version is everything mostly worked. The eviction aspect worked so nodes were able to get drained. I was also able to perform a full 1.21 to 1.22 upgrade (control plane and worker nodes) with zero downtime to a running web service, however when I manually recreated nodes a 2nd time I experienced 504s for 90 seconds -- I was able to repeat this 504 issue twice. I'm not 100% sure if this is related to Pod Graceful Drain or Terraform.

Here's a break down of what I tested and how I tested it.

Environment

  • 1.22 EKS cluster with managed worker nodes
  • Terraform's EKS module v18.19.0 to handle creating / updating the cluster
  • Pod Graceful Drain installed through Helm with v0.0.10-beta.1
  • Running Argo CD 2.3.3 connected to AWS Load Balancer Controller 2.4.1
  • Argo CD has a web UI service hooked up to an Ingress

Argo CD in this case is our example app to test zero downtime deploys.

For testing that the app remains up I'm using https://github.com/nickjj/lcurl which makes a request to a host every 250ms and reports back the status code and a few other stats.

In all of the tests below I'm running lcurl https://argocd.example.com 0.25 where example.com is replaced with my real host name.

Existing 1.22 cluster

Restarting the Argo CD deployment

This is a basic sanity check to ensure things work normally independently of evicting pods and draining nodes.

Restart command: kubectl -n argocd rollout restart deployment.apps/argocd-server

I also ran kubectl get pods -n argocd --watch to keep an eye on the pods.

Without Pod Graceful Drain:

  • 502s for 10 seconds (expected)

With Pod Graceful Drain

  • 200s across the board (expected)

Renaming the node group

This was done with Terraform's EKS module. I renamed the node group by appending -2 to its name. This supposedly creates a new set of nodes while doing all of the lower level tainting, draining and evicting, etc. and deletes the old nodes afterwards.

I ran kubectl get pods -n argocd --watch to keep an eye on the pods and kubectl get nodes -o wide --watch to watch the nodes.

With Pod Graceful Drain

  • 504s for 90 seconds (sort of unexpected)

This 90 second downtime is related to how long Argo CD takes to spin up. This period of downtime would increase depending on how long all apps take to come up.

I'm not sure if this is related to Pod Graceful Drain. I've never done a node upgrade before.

At least the pods can be evicted now which I think means your patch is a success. As for the downtime when nodes get re-created do you think that's related or unrelated to Pod Graceful Drain? Is there anything in Pod Graceful Drain that could maybe interfere with the draining process?

I was under the impression once a node gets tainted new pods will not be scheduled and once new nodes join your cluster that's capable of running them the old nodes will get drained which involves running duplicate copies of the pods on the new nodes, once that process finishes the old nodes will be terminated. In theory during this process there would always be at least 1 pod running to ensure zero downtime?

Upgrading a new cluster from 1.21 to 1.22

I deleted the old 1.22 cluster and made a new 1.21 cluster.

Restarting the Argo CD deployment with 1.21

I confirmed 1.21 is capable of running v0.0.10-beta.1 on its own without issues independent of upgrading the cluster. I was able to achieve zero downtime deploys of Argo CD using the first rollout test from above.

During the control plane and worker node cluster upgrade

This worked flawlessly. There were no 502 or 504s reported. There were 2,200 consecutive 200s reported in a row while the pods got moved from the old nodes to the new nodes in roughly 13 minutes (+12 minutes to upgrade the control plane).

Restarting the Argo CD deployment with 1.22

I confirmed 1.22 is capable of running v0.0.10-beta.1 on its own without issues independent of upgrading the cluster. I was able to achieve zero downtime deploys of Argo CD using the first rollout test from above.

Renaming the node group

Just to see if the first time was a fluke I did the same rename process as before and experienced the same 504s downtime for 90 seconds. It's interesting that a cluster upgrade had zero downtime but renaming the node group afterwards has downtime.

If you need any more details please let me know.

Thank you for very detailed report!
I'm happy to hear that it worked for both 1.21 and 1.22.
I'll dig into the issue with renaming the node group.

I found the minimal 100% reproduction step.

  1. set up a cluster with 2 nodes.
  2. deploy a deployment with 1 replica that has a long startup time (just nginx image with initialContainer sleep 30 is fine)
  3. kubectl drain on a node that the pod is located.
  4. You can see that the pod is evicted first (drained first), then additional pods are created later.

The normal rollout process spins up the additional pods first, then terminates the previous pods. It is gracefully controlled by the deployment controller or the replicaset controller.
However, the eviction process seems to be different. Pods are evicted from the node first (and pods are drained) then replicaset controller tries to reconcile the pod replica count later without being noticed by eviction processors. I didn't recognized this until now.

Also while I'm trying to reproduce this, I found that the eviction behavior triggered a concurrency issue that evicted pod skips admission webhooks.

  1. pod-graceful-drain is evicted first, and it is the only replica in a cluster. Other pods are not ready at this time.
  2. then the other pods are evicted. However, with this option, it skips admission webhook if it is not available.
    I deliberately choose this default to prevent deadlocks when there is an outage on pod-graceful-drain.

To mitigate this, I think we should have enough replicas and make sure they are distributed across multiple nodes. It is also good for the availability in general. The eviction process is similar to general node failures. It might be applied to the pod-graceful-drain itself too.

I might be able to temporarily increase the replica count of the deployment when the pod is requested to be evicted in later versions.

I'll release the binary v0.0.8, and the chart 0.0.10 for addressing k8s 1.22 soon.
Can we close this issue and continue on here #33 on these evicted pods?

Yep, sounds good.