fd4s/fs2-kafka

Multiple consumers read from the same partition after rebalance

squadgazzz opened this issue · 2 comments

With fs2-kafka 3.0.1 and also on 2.x.x, we faced with the issue where multiple consumers that share the same group id start reading from the same partition after rebalance.
The flow looks as follows:

  • Consumer A is reading from partition 1. Current offset is 1453;
  • A new consumer B connects and is assigned to partition 1;
  • Logs for the consumer A:
Updating assignment with
Assigned partitions: [mytopic.v1-2]
Current owned partitions: [mytopic.v1-2, mytopic.v1-1]
Added partitions (assigned - owned): [] Revoked partitions (owned - assigned): [mytopic.v1-1]
Revoke previously assigned partitions mytopic.v1-1
Need to revoke partitions [mytopic.v1-1] and re-join the group
Notifying assignor about the new Assignment(partitions=[mytopic.v1-2])
Adding newly assigned partitions:
(Re-)joining group
Failing OffsetCommit request since the consumer is not part of an active grou
Successfully joined group with generation Generation{generationId=289, memberId=***protocol='cooperative-sticky'}
Successfully synced group in generation Generation{generationId=289, memberId=***protocol='cooperative-sticky'}
Updating assignment with
Assigned partitions: [mytopic.v1-2]   
Current owned partitions: [mytopic.v1-2]
Added partitions (assigned - owned): [] Revoked partitions (owned - assigned): []
Notifying assignor about the new Assignment(partitions=[mytopic.v1-2])
Adding newly assigned partitions
  • Consumer B starts reading from offset 1455, but consumer A continues reading from the same partition for some reason. It fails to process the event multiple times and moves to the next one, and so on.
  • Sometime after, the offset for consumer A is around 1600 and for consumer B is 2000. Another rebalance happens and consumer A gets the ownership again. Since the offset was 1600 it successfully commits that offset and starts reading the event up to 2000 that were already processed by consumer B.

We have already checked the broker and found no issue. Especially the logs confirm that the broker issues the right events. There might be something with the ConsumerRebalanceListener implementation in the library.

Might be related to the #127