Deleting the namespace holding a CR usually orphans cluster-scoped resources "owned" by this CR
porridge opened this issue · 5 comments
The issue
Let's say we have an helm-based operator for a Foo
kind, whose backing helm chart renders some namespace-scoped resources, as well as some cluster-scoped resources, say a ValidatingWebhookConfiguration
called ${foo-namespace-and-name}-foo-validation
.
Now take the following sequence of events:
Setup
- a user creates a
foohome
namespace, - in that namespace the user creates a
myfoo
instance ofFoo
- the operator adds a
uninstall-helm-release
finalizer tomyfoo
- the operator runs helm to create resources in
foohome
as well as thefoohome-myfoo-foo-validation
ValidatingWebhookConfiguration
object; this also creates the helm "release" secret
Now we have two deletion scenarios
Deletion of CR alone
- user does
kubectl delete foo myfoo -n foohome
- operator runs helm to delete the resources it has previously deleted. This works fine since the "release" secret is still there and has a list of what needs to be deleted
- operator removes the finalizer from
myfoo
- kubernetes finishes the deletion and
myfoo
vanishes
So everything works as intended.
Deletion of the namespace
- user does
kubectl delete ns foohome
- kubernetes performs a cascading deletion of all objects in the
foohome
namespace - usually kubernetes gets to delete the helm "release" secret first
- not long after the operator tries to process the
myfoo
deletion. But by then the release secret is gone. So all that the operator seems to do is logINFO controllers.Helm Release not found, removing finalizer
and remove the finalizer. - kubernetes completes the deletion of the namespace-scoped resources and the
foohome
namespace as well
In this scenario, the foohome-myfoo-foo-validation
stays around indefinitely.
Moreover, if it was not for the disambiguating prefix on the cluster-scoped resource name, then when the user now retries steps (1) and (2) with a different name, the operator will fail to reconcile with an error such as:
rendered manifests contain a resource that already exists. Unable to continue with install: ValidatingWebhookConfiguration \"foo-validation\" in namespace \"\" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: key \"meta.helm.sh/release-namespace\" must equal \"newns\": current value is \"foohome\""
I used "usually" in the title because this is a race, but a race that I found to be usually lost by the helm operator.
Ideas for fixing this
Put a finalizer on the helm release secret as well, and only remove it when about to remove the finalizer on the CR.
Alternative: when CR is being deleted, then instead of depending on the release secret, do a dry-run processing of the helm chart, and delete everything it spits out. Note that this not reliable, because the CR spec might have changed between the moment when the controller has last acted on it and now (leading to a change in the list of resources produced).
Put a finalizer on the helm release secret as well, and only remove it when about to remove the finalizer on the CR.
With this option, what would happen if a user just went in and manually tried to delete the helm release secret? It seems like the following steps would occur:
- Helm release secret gets a deletion timestamp, but is not deleted because it has a finalizer
- CR gets reconciled since we're watching the release secret (what happens here?)
The deletion of the helm release secret would hang until the finalizer was deleted, and we would be unable to process any further helm upgrades (you can't update an object that has a deletion timestamp)
It seems like the only option would be to have the reconciler remove the finalizer, let the deletion proceed, and then re-install the release to get the secret re-created.
But that seems like it would making handling the "delete the namespace" case problematic since you can't re-create anything in a terminating namespace.
I'm not a huge fan of the alternative solution for the reason you mention (there's no guarantee the secret matches the CR)
In general (and unfortunately) there are lots of issues with deleting namespaces without cleaning up inside them first. Off the top of my head:
- If the namespace also happens to contain the operator itself, the operator deployment might be deleted before the CRs in that namespace. That would cause the namespace deletion to hang because the CRs were not able to be finalized.
- If a CRD has a webhook served by a deployment in the namespace and the namespace contains CRs, the webhook might get deleted before the CRs, and then since the webhook is unavailable, the namespace deletion hangs because the API server is unable to reach the webhook to process the remaining CRs.
My point being, perhaps a simpler immediate solution is to document that namespaces containing CRs managed by helm-based operators should not be deleted.
Nonetheless, I'm definitely open to ideas to come up with a way to solve this to help avoid these races altogether.
Currently I think the only safe option would be to:
- (optionally) put up an error in the CR
status
warning about the unexpected secret deletion, and - refuse to reconcile anything other than a deletion of the CR.
This would solve the "namespace deletion" case. As for the "user decided to delete just the release secret" case, this should at least guide them to what they did wrong - by following the breadcrumbs of secret -> ownerRef -> CR -> status.
In addition to conflicting with the namespace deletion, I don't see how letting the secret deletion proceed and re-installing the release could be made reliable. AFAICT the release secret is the only state storage mechanism we have. Once we let it disappear, we lose track of what resources should be deleted - the set of resources derived from current CR spec
might be different than the one that just got deleted, since spec
might have been changed in the meantime.
Eventually, I think this is a perfect use case for the liens
a.k.a. in-use-protection feature that is proposed.
Eventually, I think this is a perfect use case for the liens a.k.a. in-use-protection feature that is proposed.
Oh TIL about that KEP. Big 👍 from me. That feature would be a big win for operators.
One of the root causes of this problem is that namespace-scoped objects (the CR controlling a helm release) can't be owners of cluster-scoped objects or objects in other namespaces. If the CR was cluster-scoped:
- It couldn't be deleted as part of a namespace deletion
- It could own any other object in the cluster, making the
helm uninstall
step much less important, because kubernetes garbage collection would take care of anything that was missed.
So I'm wondering if we should investigate support for cluster-scoped CRs in the helm operator. It would be much easier to say: "Any helm chart that deploys cluster-scoped objects should be managed by a cluster-scoped CR".
It seems like this would be fairly straightforward to implement, and the only big hurdle to overcome would be where to store the release secret (and that could likely be easily solved by a CLI flag, environment variable, or maybe even an annotation in the custom object).
In fact, I was experimenting with a completely unrelated project recently and decided to try using this library to make a cluster-scoped object own and reconcile a helm release, and it generally seemed to work. This is all I had to do. That's clearly begging for a better interface, but its clear to me that its pretty do-able.
While adding support for cluster-scoped CRs might be a useful feature, I don't think we should recommend to operator developers to make their CR cluster-scoped just because their helm chart happens to contains a couple of cluster-scoped resources. There are valid reasons to keep things in namespaces, access control and resource quotas being primary ones. So I don't think it counts as a solution to this issue.
Incidentally, when designing the API of the project we use helm-operator-plugins for, one of the hardest part was deciding whether the CRs should be cluster- or namespace-scoped. One of our CRs represents a typical web app, which is perfectly suited to be namespaced-scoped.
The hard part was what to do with a handful of cluster-scoped resources that the associated helm chart happened to create:
- PersistentVolume and StorageClass which were convenience features in the helm chart
- SecurityContextConstraints - to allow using hostDir storage
- PodSecurityPolicy - ditto
- ClusterRole - needed only because of the presence of the PodSecurityPolicy
In the end we decided to go namespace-scoped and rely on finalizer to help us clean up resources that k8s GC would not handle.
At that time we did not even know that the helm operator did not support cluster-scoped CRs, but it would have been another hint in that direction.
After some time, we changed to using one of the stock SCC, and now that PSPs are on their way to removal, most of the reasons to go cluster-scoped are gone. If we had decided to go cluster-scoped we would be in an awkward place, since the scope of a CR can never be changed, even across API versions.
In general, exactly because of the asymmetry you mentioned, it seems to me like going cluster-scoped is a slippery slope. Once you go down that route, you find that more and more things need to be cluster scoped.
Since kubernetes/enhancements#2840 is still in KEP phase, I'd like to restart the discussion assuming we might not have it eventually after all.
Coming back to your question:
With this option, what would happen if a user just went in and manually tried to delete the helm release secret?
I have a followup question: what happens currently, let's say in the worst possible corner case of operator controller being down for a moment and the CR getting changed or deleted in the meantime? I'm assuming we do not lose track of the operand resources, thanks to the owner references. So what is the impact actually? If none, then why do we need this secret anyway? I must be confused 🤔