Adding a finalizer on a kafkatopics.kafka.banzaicloud.io doesn't prevent the deletion
cmeichel-scality opened this issue · 2 comments
Describe the bug
Adding a finalizer
on a kafkatopics.kafka.banzaicloud.io
doesn't prevent the deletion
Steps to reproduce the issue:
- Prerequisite: have a k8s env ready, and a kafka cluster
my-cluster
- create file
topic.yaml
apiVersion: kafka.banzaicloud.io/v1alpha1
kind: KafkaTopic
metadata:
name: my-topic
finalizers:
- foo
spec:
clusterRef:
name: my-cluster
name: my-topic
partitions: 1
replicationFactor: 1
- Apply is in k8s:
kubectl apply -f topic.yaml
- Check the topic is created in kafka (I use https://github.com/birdayz/kaf)
- Delete the kafkatopics.kafka.banzaicloud.io:
kubeclt delete
:kubeclt delete kafkatopics.kafka.banzaicloud.io my-topic
- Check the topic in kafka => deleted
- check in k8s:
kubectl describe kafkatopics.kafka.banzaicloud.io my-topic
=> still there
Expected behavior
If user add a custom finalizer
, the topic in kafka should not be deleted. Basically, the topic cannot be deleted since there is still finalizers
than finalizer.kafkatopics.kafka.banzaicloud.io
Screenshots
NA
Additional context
Basically, the following line (https://github.com/banzaicloud/koperator/blob/master/controllers/kafkatopic_controller.go#L226):
if util.StringSliceContains(topic.GetFinalizers(), topicFinalizer) {
should be
if util.StringSliceContains(topic.GetFinalizers(), topicFinalizer) && len(topic.GetFinalizers()) == 1 {
Dear @cmeichel-scality!
We will inform you soon about our opinion with possible solutions.
Hey @cmeichel-scality,
Apologies for the delay in the response.
Theoretical points on building hierarchy/codependency in finalizers:
The finalizers set on the CR are a mechanism to make the K8s deletion of the CR wait until the finalizers' controllers clean the - possibly unmanaged/formally-unowned - related resources up and perform more complex cleanups than cascadic deletion of resources based on the ownerReference
.
When the corresponding controller done its job then the related finalizer mark is taken off of the CR.
Also during deletion of a CR it is impossible to add/put back finalizers onto the CR, finalizers can only be removed from the CR's corresponding list in this phase.
This is the mechanism we use to make sure to clean up the topic-in-Kafka when their associated KafkaTopic CRs are deleted, using the corresponding Koperator KafkaTopic
controller. So when the Koperator KafkaTopic
controller runs on the CR deletion, it always removes the topic-in-Kafka and when finished takes its finalizer off of the KafkaTopic
CR so the removal of the CR can progress.
To summarize, the finalizers are there to prevent the deletion of the CR before the complex or unmanaged derived resources would be deleted.
But the finalizer mechanism is not supposed to enforce the prevention of the deletion of the corresponding unmanaged/formally-unowned external resources (topic-in-Kafka in this case) until only the CR's lifecycle controller's finalizer is left on the CR.
There is no such behaviour associated with finalizers or hierarchy between finalizers in that sense.
It does not handle the shared ownership of the unmanaged/formally-unowned external resource, that must be implemented out-of-band of the finalizers of a CR.
The operator pattern practices we are used to throughout the industry manage the finalizers independently from each other - even Red Hat's operator best practices states
One or more finalizers can be set on resources. Each controller should manage its own finalizer and ignore others if present.
Practical aspects of handling your problematic case
Fortunately Koperator has the ability to express and control the ownership of the topic-in-Kafka related to the KafkaTopic CR.
The managedBy
annotation on a KafkaTopic CR is controlling whether Koperator manages the topic-in-Kafka, meaning synching K8s CR changes to Kafka, including topic creation, update, deletion.
When the value of this annotation is set to koperator
the CR changes are synched to Kafka, when the label has any other existing value the CR lives independently from the topic-in-Kafka.
With that in mind the following procedure could give you the resulting behavior you are seeking to my understanding:
- Change the
managedBy: koperator
annotation tomanagedBy: yourapplication
to remove the association. The actual value is irrelevant at the moment as long as it is not empty and not equalskoperator
. - Mark the CR for deletion.
- Koperator takes its finalizer off after it realizes there is nothing to do because the topic-in-Kafka is not managed by it.
- Other controllers having their finalizers on the KafkaTopic CR will be triggered as well, including yours if it is present.
- You can run your finalization logic in your controller, including your actions depending on the existence of the KafkaTopic.
- Because you are the managers of the topic-in-Kafka now according to step 1 and the
managedBy
annotation, you should either remove the topic-in-Kafka during the finalization logic of your controller or pass that ownership to one of your services which will later clean up the topic-in-Kafka whenever/wherever it is feasible to you. Just make sure you don't leave the topic-in-Kafka around indefinitely as "garbage". - When the last controller finishes its finalization logic the last finalizer is taken off of the
KafkaTopic
CR and the CR is deleted from K8s, regardless of the state of the topic-in-Kafka, because Koperator is no longer managing the topic-in-Kafka.
This is what we have today.
Improvement ideas:
We also contemplated what you were suggesting about having some kind of hierarchy or defer mechanism in the finalizers logic so it can be signaled to Koperator that its KafkaTopic
controller should run last.
We don't have control over the order of controllers on finalizers executed other than how the Koperator's KafkaTopic
controller handles its "priority".
We could technically implement an artifical blocking wait in Koperator's KafkaTopic
controller to ensure 3rd party finalizers are all taken off of the CR before Koperator's run, but that would be unnecessary in most cases - unlike yours - and would go against the industry best practices outlined in the theoretical part thus making it somewhat unintuitive for people used to other operators.
Because of the cons of that approach we tried to find a middle-ground that wouldn't disrupt normal usage while giving mechanisms to achieve the functionality you are seeking and came up with the idea of implementing either a special managedBy
field such as deferred
or another annotation such as koperator/topicCleanUp: deferred
that would communicate the intent of having the Koperator KafkaTopic
controller clean up the topic-in-Kafka last among finalizers. This would have the benefit of having an intuitive default behavior according to best practices but also giving option to the user to have the topic-in-Kafka cleaned up last. The full procedure would look like
- Set the corresponding annotation on the
KafkaTopic
CR todeferred
. - Mark the CR for deletion.
- The finalizer controllers run, but Koperator's
KafkaTopic
controller checks the finalizer list, as long as it's not the sole finalizer it will wait/requeue and keep its finalizer in the list. - When Koperator's
KafkaTopic
controller sees it is the last finalizer remaining it runs its logic and removes the topic-in-Kafka. Then finishes it's finalization logic and removes its finalizer from the CR which was the last finalizer. - The
KafkaTopic
CR is deleted.
This requires a minor feature implementation to work and the introduction of a new annotation (value at least) that we need to design to be easily extensible if we want to move further with the sophistication of ownership handling which needs a bit of a think-through on our end.
Also however simple the solution might be, it would still be added complexity to the operator so we would only take upon this path if there is both a strong need for the functionality for you (the current procedure not being satisfactory) and this use-case (wanting to defer topic-in-Kafka deletion while not taking full ownership of it) and its better handling being more prominent to the wider community.
We would like to hear feedback from you on these points we shared while we will try to gather opinions from the community in the near future on this matter.
Readers of this issue, feel free to react/comment your opinions.
Edit: typo