jet/kafunk

Consumer fails to commit offset periodically after partition rebalance event

Closed this issue · 0 comments

My team has observed what we believe to be a bug in the Kafunk v0.1 consumer. We have a consumer group composed of 4 hosts located in Azure's uswest region all consuming a single 8-partition topic from a broker located in Azure's useast2 region (so, nontrivial round-trip times).

The attached logs span the time period of 2017-06-11 22:00:00 UTC to 2017-06-19 23:00:00 UTC. Note that during this time period not a single message was written to the topic. Only partition 4 has ever had a message published to it (log size =1). The other 7 partitions have a log size of 0.

For monitoring purposes, we have a separate process that periodically logs the committed offset from the broker (obtained via Kafunk.Consumer.fetchOffset). This monitor service observed that there was no committed offset (fetchOffset returned -1) for partition 2 during the following time periods:

  • 2017-06-13 04:30 UTC to 2017-06-13 20:00 UTC
  • 2017-06-14 20:20 UTC to 2017-06-15 21:00 UTC
  • 2017-06-16 21:10 UTC to 2017-06-19 18:50 UTC

Under normal circumstances the Kafunk consumer periodically commits its offsets even if the offset has not advanced since the previous commit. That has not been the case with this particular consumer group. From looking through the logs we see several partition rebalance events during which partition 2 gets reassigned to a new host. After the rebalance, the consumer fails to begin the periodic commit loop. We suspect this may be due to the low volume nature of the topic.