linkedin/brooklin

Brooklin KAFKA mirroring task produces duplicated records on re-balance

sanjay24 opened this issue · 3 comments

Subject of the issue

If Group coordinator becomes unreachable for a kafka mirroring task (consumer end), it triggers re-balance and causes duplicated records

Your environment

  • Operating System
    CentOs 7.6
  • Brooklin version
    master/1.0.2
  • Java version
    1.8
  • Kafka version
    2.1.0
  • ZooKeeper version
    3.4.13

Steps to reproduce

  1. Enable and start a kafka mirroring task
  2. Make the source broker unreachable
  3. See that re-balance is triggered and check for duplicates

Expected behaviour

No duplicates

Actual behaviour

Duplicated data

If 'exactly once semantics' are not supported are there any suggested configurations which could reduce potential duplicates?

This is by design, @sanjay24 . Brooklin supports at least once semantics. We haven't assessed what it would take to have Brooklin operate under exactly once semantics when mirroring Kafka clusters.

Please, feel free to reach out if you have any questions.