loft-sh/kiosk

The kiosk webhook definition should be static

moshloop opened this issue · 6 comments

An incorrectly scoped webhook can impact the stability of kubernetes if it covers too many objects and subsequently is unable to serve webhook requests.

@moshloop thanks for reporting this issue! The webhook configuration deployed by the chart has no functionality at first and will be filled by kiosk during runtime (the caCert will be inserted etc.). Scope of the webhook is limited to the kiosk group config.kiosk.sh and only namespaces that are owned by an account (for correct account quota functionality). I don't think we can narrow down the scope any further for the webhook. Could you elaborate a little bit more about why this should be changed?

So there are 2 parts to this issue:

  1. It should be static and defined up-front rather than dynamically at runtime, this provides 2 benefits:
    a) Kiosk will not require permissions to create or modify webhooks
    b) The scope of the webhook and any subsequent change is easily reviewable.

  2. The scope of the webhook itself should be limited, e.g CREATE might be needed on all resources, but Update is only needed for resources that specific limits/requests - This saves unnecessary webhook calls

@moshloop thanks for clarifying, regarding your points:

  1. the main issue is the caCert that needs to be inserted dynamically for the webhook unless you use another solution like cert-manager. So someone has to modify the webhook configuration after its creation and personally I prefer giving the rights to update the webhook to kiosk than to require an additional cluster component. Regarding b), we could think about moving the actual configuration from the code to the yaml, if people want to change the validatingwebhookconfiguration. Currently we assumed that it is better to ensure the configuration is how we need it for kiosk than making it more flexible, but I'm open on rethinking this

  2. thats true, we probably can get rid of some UPDATE validations. Would you be willing to create a pull request for this issue?

deuch commented

We had some issue in production ... the /quota endpoint failed for some reasons and blocked all the applicative deployments.

The failure policy is set to fail and it cause a lot of trouble. We do not want to use account quota and we want to remove this path in the ValidatingWebhookConfiguration. But we can’t because it’s dynamic.
And if a node reboot for patching (we are using AKS) and that some applicative pod start before the kiosk endpoint, the deployment stay in a bad state ...

We have a lot of issue too with call to the webhook failed because of inconsistent rootCA which lead to ssl errors and failed call (and failed deployment). We prefer to use our own certificate and rootCA like all of ours custom admission controllers.

Can you add an option to use our own ValidatingWebhookConfiguration and rootCA/Certificate ?

@deuch thanks for the information! I'll add an environment variable to disable the dynamic provisioning of the webhook (it is called UPDATE_WEBHOOK), then you are free to deploy your own ValidatingWebhookConfiguration. I'll also do this for the ApiService that currently follows the same pattern. Regarding the RootCA/Certificate, this is already possible, you just have to mount the files /tmp/k8s-webhook-server/serving-certs/tls.crt, /tmp/k8s-webhook-server/serving-certs/tls.key and /tmp/k8s-webhook-server/serving-certs/ca.crt with your custom certs in the kiosk pod

@deuch this can be now configured with kiosk v0.1.21