fd4s/fs2-kafka

One message from Consumer?

darkfrog26 opened this issue · 6 comments

I'd like to create a scenario where I'm gathering work tasks from Kafka, so each message may take a long time to run. To that end, I want to only consume one message at a time with the consumer so that another worker that connects would receive the next event in the queue instead of multiple events being buffered to my consumer. I've tried withMaxPollRecords(1), but that didn't seem to make any difference. Is there a way to accomplish this?

From what I understand you want to achieve queue-like behaviour with Kafka, while Kafka is much more topic oriented.
withMaxPollRecords(1) related to buffer of single consumer. If another one is added (assuming that both are in the same consumer groups), then:

  • if topic has 1 partition only, then consumer will be idle
  • otherwise they will split evenly the partitions, in non intersecting fashion.

If possible, for your case it would be good if your topic had more than 1 partition. Then e.g. 2 consumers would split consumption in queue like way. This inly holds if your tasks can be split into different partitions.

The test scenario I'm trying to reproduce is a scaling one:

  1. I send two messages to the same topic
  2. I create a consumer and want to process just the first message (processing of the event takes time)
  3. I create a second consumer on the same group as the first with the hope of processing the second message

What actually happens is that the first consumer receives the first message as expected, takes time to process, and while that is happening the second consumer connects but receives no messages. When the first consumer finally finishes processing the first message and commits it, it immediately receives the second message. This tells me that the first consumer is receiving both messages before the second consumer even starts.

If topic has only 1 partition, then it is fully expected behaviour.
One partition can be only consumed by single consumer at the time.

Hmm, so is my scenario not possible with Kafka? The idea is that I want to be able to scale up additional workers on-demand (when the queue starts to fill up), but if I only have one worker that has the next N events it doesn't matter if I add additional workers since they are all already filtered to the first worker.

I appreciate you taking the time to help out with this as it's appearing it has little to do with the library and more with how Kafka works.

You can't just achieve queue-like behavior with Kafka with topic with one partition. that's not how Kafka was designed.
If You want to scale your consumer workers to N, you need at least N partitions for given topic. If your tasks can be somehow split into parallelizable groups, then partitions would be great here to use. If your topic has only one partition and you can't change its partitions count, then you can give a try to the idea inspired by Staged Event Driven Architecture and add additional worker, who will consume load from given topic and re-distribute it to your own, new topic with valid amount of partitions set.
If can't group tasks so as to increase parallelism, then Kafka can't help you, since single topic can be consumed by single consumer in the consumer group.