pingles/clj-kafka

Offset management

Opened this issue · 6 comments

cddr commented

Hey folks,

What are your thoughts about the new method of managing offsets in kafka. There's some documentation (in the form of example code) here...

https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka

The TLDR is that there's a quite a bit of overhead to maintaining the offset in zookeeper so there's another approach which involves writing to a topic, and keeping an in-memory cache of the current offset so that consumers with high throughput, or lots of consumers groups (or both) can still commit after processing each message rather than trying to limit the frequency of commits. Would you like clj-kafka to provide something like this?

It's definitely interesting, although I'd probably lean to this being an
add-on lib that people could pull in, I guess as a kind of offset strategy.

Having said that, I'm not overly familiar with the development but I think
upcoming releases of Kafka will have a broker API suitable for centrally
managing offsets:
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommit/FetchAPI
.

Again, I'd probably err on the side of, as far as possible, letting people
choose whichever offset strategy they like.

What do you think? Would you be up for developing a clj-kafka equivalent to
the confluence code you posted?

On Tue, Sep 15, 2015 at 8:51 PM, Andy Chambers notifications@github.com
wrote:

Hey folks,

What are your thoughts about the new method of managing offsets in kafka.
There's some documentation (in the form of example code) here...

https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka

The TLDR is that there's a quite a bit of overhead to maintaining the
offset in zookeeper so there's another approach which involves writing to a
topic, and keeping an in-memory cache of the current offset so that
consumers with high throughput, or lots of consumers groups (or both) can
still commit after processing each message rather than trying to limit the
frequency of commits. Would you like clj-kafka to provide something like
this?


Reply to this email directly or view it on GitHub
#65.

D'oh. I've just realised your suggestion uses the API I found :)

Haha. Yep, definitely up for adding support. I'll see if I can get some time this week to have a look, of course pull requests are always still welcome!!

cddr commented

Cool!

I think we will need this either way so if you don't get to it, we'll get to it soon enough. Just wanted to check before digging in. Thanks for this library. It's been working great for us so far.

cddr commented

Hey @pingles. Just letting you know, I probably wont get to this any time soon as my company appears to be leaning towards using samza which handles this stuff itself.

This looks like it was done in open PR #64

Thanks for the reminder- we'll try and take a look this week for merging it in. Apologies for the delay, been busy with some other unrelated stuff at work.