cloudfoundry-incubator/quarks-operator

webhookconfiguration isn't cleaned up on operator deletion, and grows by namespaces

svollath opened this issue · 3 comments

Describe the bug
When deleting cf-operator, incl. it's kubernetes namespace, webhookconfigurations remain in the default namespace. When redeploying cf-operator to a different namespace than before, new webhookconfigurations get created in addition to those existing already, and managed deployments, like e.g. kubecf can pick invalid ones.
As a result, kubecf deployment will fail with, e.g.

Error: Internal error occurred: failed calling webhook "validate-boshdeployment.quarks.cloudfoundry.org": Post https://cf-operator-webhook.cf-operator.svc:443/validate-boshdeployment?timeout=30s: service "cf-operator-webhook" not found

The two webhookconfigurations of interest are:

> kubectl get mutatingwebhookconfigurations.admissionregistration.k8s.io
> kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io

To Reproduce
We will first have operator in namespace cf-operator, and install kubecf (successful).
Then we delete everything, and deploy cf-operator in namespace cfo - and kubecf will fail:

> kubectl create namespace cf-operator

> helm3 install cf-operator local/cf-operator-6.1.17+0.gec409fd7 --namespace cf-operator --set "global.singleNamespace.name=scf"

> helm3 install scf local/kubecf-2.5.8 --namespace scf --values kubecf-config-values_metallb.yaml

> helm3 delete scf -n scf; kubectl delete namespace scf; helm3 delete cf-operator -n cf-operator; kubectl delete namespace cf-operator

> wc -w resources
67 resources

> for i in $(cat resources); do if [ "$(kubectl get $i -o yaml | grep hook-cf-operator &>/dev/null && echo $?)" = "0" ]; then echo $i; fi; done
mutatingwebhookconfigurations.admissionregistration.k8s.io
validatingwebhookconfigurations.admissionregistration.k8s.io

> kubectl get mutatingwebhookconfigurations.admissionregistration.k8s.io
NAME                           WEBHOOKS   AGE
cf-operator-hook-cf-operator   4          133m

> kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io
NAME                           WEBHOOKS   AGE
cf-operator-hook-cf-operator   2          133m

> kubectl create namespace cfo

> helm3 install cf-operator local/cf-operator-6.1.17+0.gec409fd7 --namespace cfo --set "global.singleNamespace.name=scf"

> helm3 install scf local/kubecf-2.5.8 --namespace scf --values kubecf-config-values_metallb.yaml
Error: Internal error occurred: failed calling webhook "validate-boshdeployment.quarks.cloudfoundry.org": Post https://cf-operator-webhook.cf-operator.svc:443/validate-boshdeployment?timeout=30s: service "cf-operator-webhook" not found

> kubectl get mutatingwebhookconfigurations.admissionregistration.k8s.io
NAME                           WEBHOOKS   AGE
cf-operator-hook-cf-operator   4          142m
cf-operator-hook-cfo           4          2m16s

> kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io
NAME                           WEBHOOKS   AGE
cf-operator-hook-cf-operator   2          142m
cf-operator-hook-cfo           2          2m42s

Expected behavior
webhookconfiguration to always match the current namespace-names resp. service-names. This can be achieved by e.g. deleting those on operator deletion, even when they are located in default, or aren't namespaced - or by making them belong to the cf-operator namespace, so they would get deleted on namespace deletion.

Environment

  • cf-operator-6.1.17+0.gec409fd7
  • kubecf-2.5.8

Workaround
The following workaround has been tested up to succesful cf login to kubecf.

When a user hit the error above, or on any cf-operator deletion, the two webhookconfigurations of the former, resp. old namespace(s) have to be deleted in additon to the general deployment removal:

> kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io cf-operator-hook-cf-operator
mutatingwebhookconfiguration.admissionregistration.k8s.io "cf-operator-hook-cf-operator" deleted

> kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io cf-operator-hook-cf-operator
validatingwebhookconfiguration.admissionregistration.k8s.io "cf-operator-hook-cf-operator" deleted

Then the following kubecf deployment will succeed.

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/175294709

The labels on this github issue will be updated when the story is started.

manno commented

@svollath
Hooks are not namespaced and can't belong to a namespace. I think we could use a post-delete https://helm.sh/docs/topics/charts_hooks/

However, installing the operator again should update that webhook. Can you retry with helm3 install --wait ... cf-operator (cloudfoundry-incubator/kubecf#1194)?

manno commented

Turns out when the second operator is installed in a different namespace, the original hook configuration is neither updated nor deleted.

We'll fix this by adding a hook to helm and do better clean up.