Kafka PDB (PodDisruptionBudget) is invalid during scaling operations
alungu opened this issue · 1 comments
alungu commented
Describe the bug
The Kafka PodDisruptionBudget current implementation considers the number of brokers from the Kafka spec as being the total number of brokers. However, this is not true during scaling operation, resulting in an invalid PDB definition
Steps to reproduce the issue:
- Create a KafkaCluster with 6 Kafka brokers and a DisruptionBudget of 1
- Kafka's PDB
MIN AVAILABLE
is set to 5 (as expected); only 1 broker can be deleted
- Scale out the KafkaCluster with 3 Kafka brokers
- Kafka's PDB
MIN AVAILABLE
is updated to 2 as soon as the CR is applied; up to 4 brokers can be deleted - The scaling operation might take hours, depending on the cluster load (and time required to drain the nodes). During this time, the PDB is invalid, allowing more (than the DisruptionBudget) Kafka PODs to be deleted.
Expected behavior
If the CR sets a DisruptionBudget of 1, the Kafka's PDB MIN AVAILABLE
should be N-1, where N is the number of available brokers.
Additional context
Proposal: Instead of using the number of brokers from the Spec (or from the Status), the proposal is to use the number of Kafka PODs available in the cluster: num(ListPods(KafkaCluster))