banzaicloud/koperator

Adding a finalizer on a kafkatopics.kafka.banzaicloud.io doesn't prevent the deletion

cmeichel-scality opened this issue · 2 comments

Describe the bug
Adding a finalizer on a kafkatopics.kafka.banzaicloud.io doesn't prevent the deletion

Steps to reproduce the issue:

  1. Prerequisite: have a k8s env ready, and a kafka cluster my-cluster
  2. create file topic.yaml
apiVersion: kafka.banzaicloud.io/v1alpha1
kind: KafkaTopic
metadata:
    name: my-topic
    finalizers:
        - foo
spec:
    clusterRef:
        name: my-cluster
    name: my-topic
    partitions: 1
    replicationFactor: 1
  1. Apply is in k8s: kubectl apply -f topic.yaml
  2. Check the topic is created in kafka (I use https://github.com/birdayz/kaf)
  3. Delete the kafkatopics.kafka.banzaicloud.io: kubeclt delete : kubeclt delete kafkatopics.kafka.banzaicloud.io my-topic
  4. Check the topic in kafka => deleted
  5. check in k8s: kubectl describe kafkatopics.kafka.banzaicloud.io my-topic => still there

Expected behavior
If user add a custom finalizer, the topic in kafka should not be deleted. Basically, the topic cannot be deleted since there is still finalizers than finalizer.kafkatopics.kafka.banzaicloud.io

Screenshots
NA

Additional context
Basically, the following line (https://github.com/banzaicloud/koperator/blob/master/controllers/kafkatopic_controller.go#L226):

if util.StringSliceContains(topic.GetFinalizers(), topicFinalizer) {

should be

if util.StringSliceContains(topic.GetFinalizers(), topicFinalizer) && len(topic.GetFinalizers()) == 1 {

Dear @cmeichel-scality!

We will inform you soon about our opinion with possible solutions.

Hey @cmeichel-scality,

Apologies for the delay in the response.


Theoretical points on building hierarchy/codependency in finalizers:

The finalizers set on the CR are a mechanism to make the K8s deletion of the CR wait until the finalizers' controllers clean the - possibly unmanaged/formally-unowned - related resources up and perform more complex cleanups than cascadic deletion of resources based on the ownerReference.
When the corresponding controller done its job then the related finalizer mark is taken off of the CR.
Also during deletion of a CR it is impossible to add/put back finalizers onto the CR, finalizers can only be removed from the CR's corresponding list in this phase.

This is the mechanism we use to make sure to clean up the topic-in-Kafka when their associated KafkaTopic CRs are deleted, using the corresponding Koperator KafkaTopic controller. So when the Koperator KafkaTopic controller runs on the CR deletion, it always removes the topic-in-Kafka and when finished takes its finalizer off of the KafkaTopic CR so the removal of the CR can progress.

To summarize, the finalizers are there to prevent the deletion of the CR before the complex or unmanaged derived resources would be deleted.
But the finalizer mechanism is not supposed to enforce the prevention of the deletion of the corresponding unmanaged/formally-unowned external resources (topic-in-Kafka in this case) until only the CR's lifecycle controller's finalizer is left on the CR.
There is no such behaviour associated with finalizers or hierarchy between finalizers in that sense.
It does not handle the shared ownership of the unmanaged/formally-unowned external resource, that must be implemented out-of-band of the finalizers of a CR.

The operator pattern practices we are used to throughout the industry manage the finalizers independently from each other - even Red Hat's operator best practices states

One or more finalizers can be set on resources. Each controller should manage its own finalizer and ignore others if present.


Practical aspects of handling your problematic case

Fortunately Koperator has the ability to express and control the ownership of the topic-in-Kafka related to the KafkaTopic CR.
The managedBy annotation on a KafkaTopic CR is controlling whether Koperator manages the topic-in-Kafka, meaning synching K8s CR changes to Kafka, including topic creation, update, deletion.
When the value of this annotation is set to koperator the CR changes are synched to Kafka, when the label has any other existing value the CR lives independently from the topic-in-Kafka.

With that in mind the following procedure could give you the resulting behavior you are seeking to my understanding:

  1. Change the managedBy: koperator annotation to managedBy: yourapplication to remove the association. The actual value is irrelevant at the moment as long as it is not empty and not equals koperator.
  2. Mark the CR for deletion.
  3. Koperator takes its finalizer off after it realizes there is nothing to do because the topic-in-Kafka is not managed by it.
  4. Other controllers having their finalizers on the KafkaTopic CR will be triggered as well, including yours if it is present.
  5. You can run your finalization logic in your controller, including your actions depending on the existence of the KafkaTopic.
  6. Because you are the managers of the topic-in-Kafka now according to step 1 and the managedBy annotation, you should either remove the topic-in-Kafka during the finalization logic of your controller or pass that ownership to one of your services which will later clean up the topic-in-Kafka whenever/wherever it is feasible to you. Just make sure you don't leave the topic-in-Kafka around indefinitely as "garbage".
  7. When the last controller finishes its finalization logic the last finalizer is taken off of the KafkaTopic CR and the CR is deleted from K8s, regardless of the state of the topic-in-Kafka, because Koperator is no longer managing the topic-in-Kafka.

This is what we have today.


Improvement ideas:

We also contemplated what you were suggesting about having some kind of hierarchy or defer mechanism in the finalizers logic so it can be signaled to Koperator that its KafkaTopic controller should run last.

We don't have control over the order of controllers on finalizers executed other than how the Koperator's KafkaTopic controller handles its "priority".

We could technically implement an artifical blocking wait in Koperator's KafkaTopic controller to ensure 3rd party finalizers are all taken off of the CR before Koperator's run, but that would be unnecessary in most cases - unlike yours - and would go against the industry best practices outlined in the theoretical part thus making it somewhat unintuitive for people used to other operators.

Because of the cons of that approach we tried to find a middle-ground that wouldn't disrupt normal usage while giving mechanisms to achieve the functionality you are seeking and came up with the idea of implementing either a special managedBy field such as deferred or another annotation such as koperator/topicCleanUp: deferred that would communicate the intent of having the Koperator KafkaTopic controller clean up the topic-in-Kafka last among finalizers. This would have the benefit of having an intuitive default behavior according to best practices but also giving option to the user to have the topic-in-Kafka cleaned up last. The full procedure would look like

  1. Set the corresponding annotation on the KafkaTopic CR to deferred.
  2. Mark the CR for deletion.
  3. The finalizer controllers run, but Koperator's KafkaTopic controller checks the finalizer list, as long as it's not the sole finalizer it will wait/requeue and keep its finalizer in the list.
  4. When Koperator's KafkaTopic controller sees it is the last finalizer remaining it runs its logic and removes the topic-in-Kafka. Then finishes it's finalization logic and removes its finalizer from the CR which was the last finalizer.
  5. The KafkaTopic CR is deleted.

This requires a minor feature implementation to work and the introduction of a new annotation (value at least) that we need to design to be easily extensible if we want to move further with the sophistication of ownership handling which needs a bit of a think-through on our end.


Also however simple the solution might be, it would still be added complexity to the operator so we would only take upon this path if there is both a strong need for the functionality for you (the current procedure not being satisfactory) and this use-case (wanting to defer topic-in-Kafka deletion while not taking full ownership of it) and its better handling being more prominent to the wider community.

We would like to hear feedback from you on these points we shared while we will try to gather opinions from the community in the near future on this matter.

Readers of this issue, feel free to react/comment your opinions.

Edit: typo