lovoo/goka

Question: Auto-commit?

ryanolds-drizly opened this issue · 8 comments

Is auto-commit being disabled on the consumers? I've looked through the code and didn't see anything overriding Sarama's default value, which is to have it enabled.

db7 commented

With the old sarama version it was definitely disabled. If it is not disabled now, then this is a problem.

Looking at https://github.com/Shopify/sarama/blob/v1.27.1/config.go#L475-L476, it the comments say it's on default enabled. I don't see anything in Goka turning it off. But, I may be missing something.

Auto-Commit is enabled by default, that's right. Not sure why it would be a problem. Why are you asking?

@db7 I checked the old configs but can't find it being disabled anywhere, could you point me to where you mean?

But maybe there's a misunderstanding here between committing and marking messages? When the processor consumes messages, it marks them after a successful consuming them. Then at specified intervals, the auto-commiter commits those marked offsets to the brokers. If it wasn't, everything would be reconsumed after the next restart. Also if the processor shuts down, it commits all marked offsets one more time.

Only if the processor does not mark the messages as consumed, they will be reconsumed. The only way to not mark them is failing the processor.

You guys got me a little scared that something was broken, so I wrote two system tests in this branch that show

  • a processor with autocommit disabled - started twice and reconsuming all messages
  • a processor which crashes after the second message (so it does not mark it as consumed), and reconsumes that message on second run.

Does that clarify things or am I missing the point here?

db7 commented

@frairon After rereading your message, I think my memory may be just too blurry. It sounds right what you said: the auto-commit of Sarama only commits those that are marked, and we mark them. I think that in Confluent driver the messages were really auto-committed (but again, my memory may fool me). Sorry for the false alarm.

No worries, always good to double-check and the systemtest doesn't hurt. Also, the wording is a bit misleading, I got unsure about the same thing too when we migrated to the new version.

Thank you looking into this. This aligns with at-least-once semantics. I'm looking into if it's possible to configure Goka to operate closer to exactly-once semantics, and I think I'm hear that it is with some configuration changes.

No problem, yes goka provides at-least-once semantics. Having exactly-once-semantics however won't be possible right now, because sarama does not provide that feature.
One way to get closer to that, is building idempotent operations. You could e.g. store the latest offset used to update a state and ignore older offsets.

closing this, as it seems to be clarified now.