/admin/reassign_partitions is already there after 5 retries
bitchkat opened this issue · 2 comments
We are managing several kafka topics in a loop and sometimes we are getting failures with the /admin/reassign_partitions is already there after 5 retries messages.
I dug through the code and have a couple of questions:
-
This znode is being created in update_admin_assignment with
self.zk_client.create(self.ZK_REASSIGN_NODE, json_assignment)
-
This is creating a non-ephemeral znode
-
I can't see any reference to it being deleted?
However, at some point the znode is being deleted? Where is the znode being deleted?
I did modify the module so that the znode is created with ephemeral=True and for good measure, I delete the znode before closing the zookeeper connection to ensure its deleted in a timely manner and not hanging around when the next iteration of the loop runs.
Hi,
The /admin/reassign_partitions
is deleted by the Kafka controller once the partition reassignment successfully finishes (see https://github.com/apache/kafka/blob/121308cc7a2639a70fc8a60c99d2eaee52931951/core/src/main/scala/kafka/controller/KafkaController.scala#L894). If you create an ephemeral znode that will be deleted once the zk connection is closed, you may not fully complete the partition reassignment, which sounds dangerous.
Have your topics a lot of partitions? How many topics are being updated at a time? Currently the max waiting time is hard coded to 5 tries * 5 seconds = 25 seconds. Maybe it would be a better idea to give access to this time or number of tries so that you can adjust it accordingly to your needs.
The new 0.7.0 version that adds 2 new parameters should fix your issue.