banzaicloud/koperator

Kafka PDB (PodDisruptionBudget) is invalid during scaling operations

alungu opened this issue · 1 comments

Describe the bug
The Kafka PodDisruptionBudget current implementation considers the number of brokers from the Kafka spec as being the total number of brokers. However, this is not true during scaling operation, resulting in an invalid PDB definition

Steps to reproduce the issue:

  1. Create a KafkaCluster with 6 Kafka brokers and a DisruptionBudget of 1
  • Kafka's PDB MIN AVAILABLE is set to 5 (as expected); only 1 broker can be deleted
  1. Scale out the KafkaCluster with 3 Kafka brokers
  • Kafka's PDB MIN AVAILABLE is updated to 2 as soon as the CR is applied; up to 4 brokers can be deleted
  • The scaling operation might take hours, depending on the cluster load (and time required to drain the nodes). During this time, the PDB is invalid, allowing more (than the DisruptionBudget) Kafka PODs to be deleted.

Expected behavior
If the CR sets a DisruptionBudget of 1, the Kafka's PDB MIN AVAILABLE should be N-1, where N is the number of available brokers.

Additional context
Proposal: Instead of using the number of brokers from the Spec (or from the Status), the proposal is to use the number of Kafka PODs available in the cluster: num(ListPods(KafkaCluster))

@alungu I'm closing this as #770 which addresses this issue is has been merged now.