crossplane-contrib/provider-kubernetes

Conversion Webhook breaks deployment on clusters with seperate control plane

Argannor opened this issue · 4 comments

What happened?

After upgrading from 0.10.0 to 0.12.0 the provider is unable to start up successfully with logs indicating the CRDs cannot be watched/listed:

W0226 16:25:45.849143       1 reflector.go:539] k8s.io/client-go@v0.29.1/tools/cache/reflector.go:229: failed to list *v1alpha2.Object: conversion webhook for kubernetes.crossplane.io/v1alpha1, Kind=Object failed: Post "https://provider-kubernetes.crossplane-system.svc:9443/convert?timeout=30s": Address is not allowed
E0226 16:25:45.849254       1 reflector.go:147] k8s.io/client-go@v0.29.1/tools/cache/reflector.go:229: Failed to watch *v1alpha2.Object: failed to list *v1alpha2.Object: conversion webhook for kubernetes.crossplane.io/v1alpha1, Kind=Object failed: Post "https://provider-kubernetes.crossplane-system.svc:9443/convert?timeout=30s": Address is not allowed
crossplane-kubernetes-provider: error: Cannot start controller manager: failed to wait for providerconfig/providerconfig.kubernetes.crossplane.io caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.ProviderConfig
Stream closed EOF for crossplane-system/provider-kubernetes-6e1fd76ec9c7-65d577488d-jzgzs (package-runtime)

Our Kubernetes cluster is an AWS EKS with Calico as the CNI, therefore the cluster control plane and pods running in the cluster are running in different networks (as the CNI cannot be applied to the AWS managed control plane). The consequence of this is that every webhook needs to be run with hostNetwork: true. That in turn is also not possible (to my knowledge) since the pod is managed by crossplane.

As the conversion from v1alpha1 to v1alpha2 is fairly straight forward and could be done manually instead, maybe a command line option can be introduced to disable the webhook. Although this also affects the CRDs and thus depends on the code generation used by crossplane providers, but I might be wrong here.

How can we reproduce it?

Deploy the provider in version 1.11+ and block network access from the control plane to the webhook.

What environment did it happen in?

Crossplane version: v1.15.0
Cloud Provider: AWS
Distribution: EKS v1.29
Container Network Interface: Calico

You should be able to deploy it with hostNetwork se to true using a DeploymentRuntimeConfig.

Thank you for pointing that out, and that works partially: It works only if the ports on the host are available, which is unlikely since it includes 8080 for the metric ports (and in our case they're indeed not available).

So this approach would introduce the need of making the ports configurable, which I think would involve changes to crossplane as well, right?