StephenSorriaux/ansible-kafka-admin

/admin/reassign_partitions is already there after 5 retries

bitchkat opened this issue · 2 comments

We are managing several kafka topics in a loop and sometimes we are getting failures with the /admin/reassign_partitions is already there after 5 retries messages.

I dug through the code and have a couple of questions:

  1. This znode is being created in update_admin_assignment with
    self.zk_client.create(self.ZK_REASSIGN_NODE, json_assignment)

  2. This is creating a non-ephemeral znode

  3. I can't see any reference to it being deleted?

However, at some point the znode is being deleted? Where is the znode being deleted?

I did modify the module so that the znode is created with ephemeral=True and for good measure, I delete the znode before closing the zookeeper connection to ensure its deleted in a timely manner and not hanging around when the next iteration of the loop runs.

Hi,

The /admin/reassign_partitions is deleted by the Kafka controller once the partition reassignment successfully finishes (see https://github.com/apache/kafka/blob/121308cc7a2639a70fc8a60c99d2eaee52931951/core/src/main/scala/kafka/controller/KafkaController.scala#L894). If you create an ephemeral znode that will be deleted once the zk connection is closed, you may not fully complete the partition reassignment, which sounds dangerous.

Have your topics a lot of partitions? How many topics are being updated at a time? Currently the max waiting time is hard coded to 5 tries * 5 seconds = 25 seconds. Maybe it would be a better idea to give access to this time or number of tries so that you can adjust it accordingly to your needs.

The new 0.7.0 version that adds 2 new parameters should fix your issue.