Consumer does not recover from leaderless partition
Closed this issue · 2 comments
The following behavior was observed when running Kafunk version 0.1.8.
We have a consumer group that contains a single process consuming a single topic (several other consumer groups exhibited the issue described herein, but this was the simplest example). We experienced a partial Kafka cluster outage during which the leader broker of partition 3 died (node_id=106). During this time Kafunk detected a leaderless partition (3) and did not attempt to consume it, but when the broker became healthy again, our consumer did not notice.
The attached log shows a period of time during which the consumer was assigned partition 3, then it restarts a few times, and in the last ~50 lines we see the warning about the leaderless partition. Afterward it goes into a normal fetch/consume/commit offsets loop, but completely ignoring partition 3. Several hours later we manually restarted the process to force Kafunk to notice that partition 3 now had a healthy leader.
We would have expected a metadata update from the cluster to inform the Kafunk consumer that it now had a leader for partition 3, but this did not happen.
Consumer log: kafunk_consumer.log
Thanks for reporting, I'll take a look.