Re-consuming same messages when partitions in Kafka cluster move
MaciekRakowski opened this issue · 1 comments
When evaluating the kafka-node high level consumer, I noticed a scenario in which it consumes duplicate messages. This issue occurs during a rebalance. Here are the steps to recreate it:
- Start a high level consumer with auto-commit set to FALSE (and do NOT commit). The code below (with options properly set) will suffice:
consumer.on('message', function (message) {
console.log(message.value);
} - Send a few messages to the broker using any producer. Preferably, send messages in order to keep track of how many were sent. Keep track of the messages and number of messages sent.
- Stop the Kafka broker, and then restart it.
- Notice that the messages in step 2 were re-consumed. The Java high level consumer that comes with kafka does not do this.
This issue would be important to fix because some developers may want to commit every Nth message to improve performance. Please note that this issue also happens when partitions on the cluster changes in other ways, such as when partitions move during a rebalance, or when the cluster is expanded and partitions move. It would be ideal to make this consumer resilient as the Java one is. If there is any other information needed to help reproduce or troubleshoot this issue, please let me know.
I'm curious to know if this issue happens when you force HLC commit on close.
For example like below:
process.on('SIGINT', function () {
highLevelConsumer.close(true, function () {
process.exit();
});
});
The new version 0.5.4
will perform a rebalance when the partition changes.