
AdmissionWebhook fails on AWS EKS with custom CNI

Opened this issue · 3 comments

Describe the bug

Error from server (InternalError): error when creating "emqx.yaml": Internal error occurred: failed calling webhook "mutating.apps.emqx.io": failed to call webhook: Post "https://emqx-operator-webhook-service.emqx-operator-system.svc:443/mutate-apps-emqx-io-v2beta1-emqx?timeout=10s": Address is not allowed

AWS EKS calls webhooks from the Control Plane, which in case of a CNI other than the default VPC CNI results in Pod IPs not reachable from the control plane.

To Reproduce

  1. AWS EKS Cluster
  2. Use Cilium as CNI
  3. Install EMQX Operator
  4. Create EMQX Instance
  5. Error Message as above.

Expected behavior

Expected the EMQX Cluster to be created

Anything else we need to know?:

Environment details::

  • Kubernetes version: 1.25.6 EKS
  • Cloud-provider/provisioner: EKS + Terraform
  • emqx-operator version: 2.2.0
  • Install method: Helm

Potential Fix:

It would be good to include a mode to enable HostNetwork for the controller-manager in order to allow the EKS control plane to reach the pod.

Rory-Z commented

Sorry I don't know much about Cilium, Are you saying that if Cilium is used as the CNI, the Kubernetes mutating web hook cannot access the EMQX Operator Controller Pod ? If yes, so this CNI does not seem to meet the requirements of Kubernetes ?
Again, I don't know much about Cilium, so if I'm wrong, please let me know.

It's not a problem with cilium, the problem here is, that the operator (falsely) assumes that the kubernetes control plane is part of the K8s overlay network, which in the case of EKS + any custom CNI it isn't.

That being said, I'll open a PR that fixes the issue here by adding a switch for hostNetwork.

Rory-Z commented

Great, looking forward your PR