Farfetch/kafkaflow

[Bug Report]: KafkaFlow doesn't work with `cooperative-sticky` protocol

AlexeyRaga opened this issue · 3 comments

Prerequisites

  • I have searched issues to ensure it has not already been reported

Description

As discussed in this thread KafkaFlow doesn't behave correctly when the consumer is configured to use cooperative-sticky rebalancing protocol.

The protocol is described here

Steps to reproduce

Configure the consumer using WithConsumerConfig(...) and set the the strategy:

PartitionAssignmentStrategy = PartitionAssignmentStrategy.CooperativeSticky

Expected behavior

The consumer is expected to work as usual/normal, except that rebalancing would not cause "stop-the-world" behaviour.

Actual behavior

When KafkaFlow is configured to use PartitionAssignmentStrategy.CooperativeSticky the consumer seems to be working (processes messages), but does not commit any offsets.

KafkaFlow version

2.4.1

Hi @AlexeyRaga,
I'm starting to look into this issue. As we no longer have access to the mentioned thread, can you give some more context, on what was said there?
Also, have you updated Kafkaflow to version 3? If so, is this still an issue in that version?

@lpcouto I equally have no access to the thread anymore, but I don't believe that anything useful was mentioned in there.
As far as I can remember, there was something like "oh, it should work because an underlying librdkafka does" and then "oh, no, it doesn't indeed because we use the callbacks that are different in the cooperative mode".
But not much more.

I haven't tried using cooperative-sticky mode in version 3, but I don't believe it'll work.
It is extremely easy to test though, just set the mode to be CooperativeSticky and observe no offsets committed.

@AlexeyRaga
When the cooperative-sticky rebalancing protocol is set, Kafka's response to the rebalance is different - we get the incremented partitions, and not the consumer's total list of partitions. This causes an unexpected behavior in Kafkaflow - it cannot commit offsets to partitions that it supposedly doesn't have.
We are working on a solution and will let you know when done.