Consumer does not recover from leaderless partition

Question

Consumer does not recover from leaderless partition

Closed this issue 7 years ago · 2 comments

The following behavior was observed when running Kafunk version 0.1.8.

We have a consumer group that contains a single process consuming a single topic (several other consumer groups exhibited the issue described herein, but this was the simplest example). We experienced a partial Kafka cluster outage during which the leader broker of partition 3 died (node_id=106). During this time Kafunk detected a leaderless partition (3) and did not attempt to consume it, but when the broker became healthy again, our consumer did not notice.

The attached log shows a period of time during which the consumer was assigned partition 3, then it restarts a few times, and in the last ~50 lines we see the warning about the leaderless partition. Afterward it goes into a normal fetch/consume/commit offsets loop, but completely ignoring partition 3. Several hours later we manually restarted the process to force Kafunk to notice that partition 3 now had a healthy leader.

We would have expected a metadata update from the cluster to inform the Kafunk consumer that it now had a leader for partition 3, but this did not happen.

Consumer log: kafunk_consumer.log

Answer 1 · 2018-01-17T12:12:00.000Z

Thanks for reporting, I'll take a look.

Answer 2 · 2018-03-14T14:25:17.000Z

Should be addressed in #213 it looks like it wasn't explicitly removing the affected partitions from the metadata view.