kubernetes-sigs/kustomize

Kustomize resource ordering regression

yanniszark opened this issue ยท 21 comments

Describe the bug

In Kubeflow, we are using Kustomize v3.2.0 and want to upgrade to v4.0.5: kubeflow/manifests#1797
However, our deployment failed for v4.0.5, while it succeeded for v3.2.0.

This is what happened:

  1. In later kustomize versions, it orders Admission Webhooks last. Before, they were first.
  2. Pods are created before the istio injection webhook, thus they don't get a sidecar.
  3. Our apps fail because they haven't been mutated appropriately.

Regression Background

For reference, here is the original issue and PRs that made these changes:
#821
#1104
#2459

Issue #821 presents the following scenario, which led to PR #1104:

  1. User builds cert-manager kustomization, containing a webhook, a deployment and a CR (simplified).
  2. Apply webhook and deployment.
  3. Apply CR.

Step (3) fails because the Deployment has not become ready yet.
The solution SHOULD be to retry the apply.
PR #1104 solution was to order the webhook last, so that it doesn't mutate/validate the CR.
This is false, as it circumvents logic that the application has explicitly declared should be applied to all relevant resources.

Files that can reproduce the issue

Please see: https://github.com/kubeflow/manifests/blob/v1.3-branch/README.md
which includes the example kustomization we use for Kubeflow components.

  • Build the example kustomization with kustomize v3.2.0, as per the README.
  • Build the example kustomization with kustomize v4.0.5. You will see the WebhookConfigurations ordered last, which causes the issues.

Proposed Solution

Restore the order of Mutating / Validating Webhooks as it was before PR #1104

Kustomize version

v4.0.5

cc'ing authors of the referenced issues and PRs: @donbowman @mgoltzsche @asadali
cc @monopole @Shell32-Natsu @pwittrock

I am currently working on a PR for addressing the problem described above. If possible, @monopole @Shell32-Natsu @pwittrock I would love if you could take a look at the described issue and tell me if you agree/disagree with the points made.

Thanks! Discussing this on the earliest issue you mentioned #821

People can order their inputs to get the desired output order, which may be the only answer.

There's no one ordering that makes sense for everyone, since no ordering can take custom resources into account.

the original motivation for this has become somewhat moot since the community has moved away from declarative things and into imperative.
e.g. istio no longer supports yaml
prometheus et al operators now create their crd dynamically.

this leads to a race condition, you cannot apply a system w/ a yaml doc to install prometheus followed by one that sets up a servicemonitor on some other thing. No ordering can fix this (since the order is correct). and, the system will not reconcile to the desire state (since the servicemonitor is rejected as an unknown type).

What is the solution? I dunno, does one call
until kustomize; sleep 1; done
?

Generally I agree that kubernetes resource usage should be declarative. Though there are exceptions to the rule as e.g. the Namespace and potentially any custom webhook/CRD. Due to this reason I am in favour of preserving the resource order within a kustomization as the author defined it and authors, when they must define complex webhook/CRD constellations, need to take the resource order within their manifests into account anyway or it won't be installable at the first attempt.
If I remember correctly the previous problem with the cert-manager example was that, while in the original manifest a custom resource appeared before the ValidatingWebhookConfiguration, in the kustomize output it was ordered after the latter which failed of course because the webhook service/Deployment was not available yet.
Preserving the resource order allows users to apply manifests once instead of doing retries just to fix the initialization order. The feature can be enabled in kustomize using --reorder none.
I'd argue that the resource order should be fixed within the kustomization that cannot be applied at the first attempt because its authors know the most about the initialization order and can design it so that it works fluently/deterministic.

So as @mgoltzsche mentions, kustomize emits resources in FIFO order if one specfies:

kustomize build --reorder none {target}

Some history of the --reorder flag is mentioned here: #821 (comment).

#1104 happened before the --reorder flag was introduced (#1171). Anyone relying on --reorder legacy (legacy is the default) is going be broken by PRs like this one and #1104.

@mgoltzsche, @yanniszark do you agree? Someone will come along later and change it again.

I'd like to make none the default value of --reorder, but it would break a bunch of people, and they'd have to reorder their configs. We could deprecate it over the course of a year or so. WDYT?

the declarative nature of kubernetes isn't going away, but lets not derail this issue. @donbowman there is indeed a concept of polling for status being developed: https://github.com/kubernetes-sigs/cli-utils/blob/master/cmd/status/cmdstatus.go

not trying to derail, but in the last 2 years i've had to abandon parts of my strategy of having a single declarative setup. I'm the one that opened the original issue, it was indeed motivated by e.g. cert manager webhooks.

Since then, istio has moved to a standalone binary to install/update, so i cannot have it in my gitops + kustomize.
certmanager + its webhooks, you have to do a 3 pass install to get it in, and check the status between those. Although the status work concept is interesting, its unclear how i would use this in kustomize? I could have an object that would pause the pipeline until the operator had started up and created its crds before starting to install the webhook?

so i'm skeptical at this stage that order alone can solve (since its inherently order + wait for operator to initialise + wait for webhooks to finish setting up). maintaining author order would be a good step tho.

I'd like to make none the default value of --reorder, but it would break a bunch of people, and they'd have to reorder their configs. We could deprecate it over the course of a year or so. WDYT?

@monopole I agree: deprecating the --reorder legacy default behaviour and later replacing it with --reorder none as the default (and eventually removing the --reorder option entirely) makes sense to me as well.

TL;DR
@donbowman there are already some tools out there that allow to do what you describe - at least partially. For instance kpt uses the cli-utils library for manifest deployments (kpt live apply). Personally I have used the library in another project as well and can confirm that it works very well. The library "knows" certain resource kinds (like Deployment etc.) and can wait for them to become ready when rolling out a manifest (at least via the library this behaviour is also easily customizable for custom resources or special conditions).
However I doubt that it can handle cases where the manifest cannot be applied yet (due to an unavailable webhook) - the status poller works on applied resources.
Though the library also provides the concept of an inventory which is effectively some kind of deployable module (actually a ConfigMap that stores references to deployed resources within a cluster - similar like those Secrets Helm 3 creates within the cluster but way easier to use since operations are idempotent (unlike helm install|update)). Using kpt a static inventory template resource can be created once per manifest directory using kpt live init . making the corresponding manifest directory a deployable unit (that can also be uninstalled or resources pruned during upgrades).
This way complex manifests (with hard ordering/initialization requirements) can be split up into multiple modules that can be deployed separately/sequentially - letting kpt/cli-utils wait for each to be ready before deploying the next one.
Though, if I am not mistaken, there is still no solution to declare which modules should be deployed in which order within a single Kptfile and deployed with a single CLI command - that would be nice to have actually. I think currently you'd still need multiple imperative kpt live apply . calls but I hope this changes soon since kpt is evolving rapidly.
Now how to use that with kustomize? kpt/cli-utils only handles static manifests. Fortunately kpt also comes with the concept of functions which are effectively Linux containers that perform transformations on some yaml input. You can have your kustomize kpt function (actually those functions are a kustomize concept somehow or rather they're maintained within this repo). An experimental example for such a function is kustomizr.

When it comes to CRDs and operators: I don't think an operator should apply its CRDs (as opposed to custom resources). CRDs should be applied separately or be contained within the manifest that also contains the operator Deployment. In case the operator should really deploy its own CRDs for some reason the operator would at least need to expose the CRD registration status within its own readiness probe. Otherwise the caller can indeed not know when the CRDs are usable unless it polls for the presence and established condition of the CRDs (kpt/cli-utils cannot help as long as it doesn't know about the CRDs in the first place when the operator applies them itself) or does subsequent custom resource deployments using retries. Though I think this is neither a problem of kustomize nor of kpt but a problem of the corresponding operator.

Thanks for the input @monopole! About:

People can order their inputs to get the desired output order, which may be the only answer.
There's no one ordering that makes sense for everyone, since no ordering can take custom resources into account.

This may indeed be the case, but it's a big change, since it effectively changes the current promise of kustomize to manifests developers. If I understand correctly, the current (default) promise is: don't care about ordering, kustomize will do it for you.

@mgoltzsche, @yanniszark do you agree? Someone will come along later and change it again.
I'd like to make none the default value of --reorder, but it would break a bunch of people, and they'd have to reorder their configs. We could deprecate it over the course of a year or so. WDYT?

May I suggest the following:

  • Keep this issue for fixing the regression in legacy ordering. I believe the original reason for ordering the webhooks last was not actually a valid reason, as described in the first comment. Webhooks should be first, to ensure they can validate/mutate subsequent resources. I have created PR #3803 for this reason.
  • Continue the more long-term discussion on what is the best way forward long-term for kustomize, in terms of reordering resources, in #821

If I understand correctly, the current (default) promise is: don't care about ordering, kustomize will do it for you.

Where is that promise made? It must be retracted. It's literally impossible right now to make one ordering that can both deploy a working stack to a cluster and universally works for everyone's custom resources now and in the future.

Generally one needs a grander mechanism which can apply some things, waits for ready, apply more things, etc. Various things are moving in this direction, and would encapsulate kustomize or some other editor.

Where is that promise made? It must be retracted.

The default behavior of kustomize is to reorder. Thus, the implied promise towards the manifest developer is that kustomize takes care of ordering so you shouldn't watch out for it. This is how I meant it.

well, we need better docs .

@yanniszark have you tried reordering your inputs to solve your particular use case?

or is this somehow not an option?

e.g. resources not under your control.

@yanniszark have you tried reordering your inputs to solve your particular use case?

@monopole that's a good question. The answer is that in Kubeflow we have many many kustomizations which have been constructed and tested with kustomize's reordering feature. This is without accounting for the various downstream uses of KubeflowSince downstream uses are essentially supported by different vendors, I'd say that this is way too big of a change to be an option, at least for the upcoming release of Kubeflow.

Thus, my thought is to get reordering working, as we now use it successfully in 3.2.0, and then possibly bring the bigger change of stopping to reorder resources, if kustomize also aligns with this direction long-term. In general, we'd like to be aligned with upstream :)

@yanniszark as you mentioned within the issue description applying a webhook and a resource the webhook would handle within the same manifest for the first time to a cluster will fail since the webhook's Deployment will not be available yet.
I don't think retrying the apply is a good solution because it creates other problems, makes everything a bit more complicated and kustomize is a manifest rendering tool - as opposed to a deployer - and its output should work out-of-the-box with the common deployment tools like kubectl apply in the simplest case.
Therefore I also recommend to reorder your resources so that the webhook comes last and make the initially applied resources independent of it (e.g. provide them with complete data considering that the webhook won't handle them initially).
In case you really need to submit resources initially that your webhook should handle I strongly recommend to split up and deploy your manifest separately: first deploy the CRD/operator/webhook and wait for it to become available (e.g. using kpt live apply . && kpt live status .) before you apply (custom) resources the previously deployed webhook should handle.

@mgoltzsche Would you object to accepting #3803?

I think we all agree this isn't the right solution, but it would seem to unblock kubeflow in the short term.

I just commented in the PR that I am objecting to accepting it indeed because, while it would unblock kubeflow for now, it would break the cert-manager installation again (and potentially others) although the latter works nicely now and - opposed to kubeflow apparently - can be installed with a single kubectl apply call.

ack, thanks.

Here's a list of ordering changes over the last ~2 years

Mostly insertions (i.e. specifying order for a particular resource where there wasn't an order before).

  • 18 Sep 2020 #3009 put endpoints before service
  • 05 Aug 2020 #2796 Order PersistentVolume before Deployment
  • 12 May 2020 #2459 Relative Order of Mutating/Validating WebhookConfigs
  • 20 Aug 2019 #1437 add PriorityClass to the order list
  • 18 July 2019 #1369 add ResourceQuota to the order list
  • 23 May 2019 #1104 Order ValidatingWebhookConfig last
  • 15 May 2019 #1074 Update order of resources to include psps
  • 20 May 2019 #1037 Apply LimitRange resources before workloads

MutatingWebhookConfiguration was moved from orderFirst to orderLast about a year ago in #2459
ValidatingWebhookConfiguration was moved to orderLast about two years ago in #1104

#3803 would move them back to orderFirst.

It's been pretty stable for a year, so making a change now may have consequences for many users. As discussed above, it's impossible for kustomize to own the answer to 'correct' ordering. So let's close #3803, and leave legacy alone.

Some other options are:

  1. kustomize honors --reorder none to suppress this sort, and obey FIFO. This works right now.
  2. we make the existing legacy sorter accept configuration from the kustomization file.
    user could specify orderFirst and orderLast in that file (would need someone to write the PR for this).
  3. user can write a sorting transformer plugin, and add that to their kustomization file as the last transformer to run (this is possible now, via various mechanisms - exec plugin, containerized plugin - but is the most work and slowest to execute).

@yanniszark if interested, please file a bug for option 2. I think it could be knocked of relatively quickly for the next release, if your think it would help.

@monopole @mgoltzsche thank you for your answers. @monopole thank you for proposing a way forward.

We will gladly implement and contribute option (2) to unblock Kubeflow, but first I'd like to get your thoughts on the following comments that have to do with the reasoning stated above:

  • IMHO skipping validation/mutation is not a good practice and kustomize should not promote it.
  • The current kustomize order actually orders admission controllers BEFORE the resources they validate/mutate. These admission controllers are LimitRange, PodSecurityPolicy, ResourceQuota. As you can see, Validating/MutatingWebhook is the exception to this existing practice.
  • @mgoltzsche you mentioned that for the tools you use, the correct way to deal with runtime timing dependencies (e.g., CRD ready before CR, Webhook ready before resource) is to split resources in different kustomizations. If you follow this practice, then the change proposed here does not break anything.

In addition, I understand that you are reluctant to make changes to ordering because as you say, Kustomize does not know all resources (doesn't know CRs, extension API-Servers). I want to ask two questions here:

  1. Assuming there are only known resources (the builtin ones), isn't the correct order to put webhooks first?
  2. Can we find any example scenario where this ordering would not work? We haven't come up with such a case yet (in Kubeflow), this is why I am asking.

All in all, and independently of whether we implement option 2, I wanted to argue that ordering admission webhooks last is not a best practice and not what kustomize is doing right now for similar resources. In addition, for tools that don't use retries (e.g., Argo, Flux use retries), @mgoltzsche mentioned that there is an existing practice and following this practice would not break them.

That said @monopole, I know that you don't want to potentially disrupt existing users. But I thought I would make these arguments first, so that you are also aware of them. We could decide to make a final change which is in-line with what Kustomize is doing with every other admission controller and avoid (2). Let me know what you think. Again thanks for your comments and repeating that we are open to implementing (2).

Did an issue for option 2 get filed?

This is the issue for option 2 implementation and design: #3913