splunk/kafka-connect-splunk

Offsets not committing for non-formatted events?

amirofmn opened this issue · 3 comments

Hello,

We're in the midst of troubleshooting why some of the consumers within our connector (SplunkSinkConnector, v1.2.0; Apache Kafka 2.5.0) are occasionally experiencing offset resets on a partition it's pulling data from. The events look like:

[2021-02-09 09:21:45,254] INFO [Consumer clientId=connector-consumer-kafka-connect-splunk-hec-sink-consumer-1-buttercup-16, groupId=connect-kafka-connect-splunk-hec-sink-consumer-1-buttercup] Resetting offset for partition buttercup-16 to offset 22691627290. (org.apache.kafka.clients.consumer.internals.SubscriptionState)
[2021-02-09 09:21:45,252] INFO [Consumer clientId=connector-consumer-kafka-connect-splunk-hec-sink-consumer-1-buttercup-16, groupId=connect-kafka-connect-splunk-hec-sink-consumer-1-buttercup] Fetch offset 23064156198 is out of range for partition buttercup-16, resetting offset (org.apache.kafka.clients.consumer.internals.Fetcher)

Prior to those events we do see messages that the consumer is "Committing offsets asynchronously using sequence number ####" for that partition with a specific offset. We haven't found any errors related to the offsets failing to write and as we've noticed in the past, if the ACL's are setup correctly with kafka-connect's internal topics (specificed in our connect-distributed.properties) file, it won't start up. So...we decided to take a look within the actual internal topic we setup for the offsets __kafka-connect-splunk-offsets and noticed it was empty.

We're not clear why it's empty.

We have noticed within our connect logs for ConsumerConfig that enable.auto.commit was set to false and cannot identify why the default value of true isn't being applied. So we've updated our configurations where the connector can override that setting and have added consumer.override.enable.auto.commit to be true and have confirmed enable.auto.commit is now set to true within our logs after a kafka-connect restart and connector update.

Offset topic is still empty and the occasional offset reset above is still occurring. Obviously our concern here is the re-ingestion of the data (i.e. duplicate events) but also the latency it's introducing due to the new offset it receives which is hundreds of millions of records behind where it was previously was. It takes a while for things to be caught up again.

I went through all the known bug fixes with starting with 2.0.0 through 2.0.2...didn't see anything specific to this issue we're seeing.

Thoughts?

Just to update my original post, while the __kafka-connect-splunk-offsets remains empty, the __consumer_offsets topic is actually getting populated as we can see the entries from our group id with the offsets it's reading from now.

Still not clear why the offsets are not being committed to the actual internal offsets topic we specified within our connect-distributed.properties file over the default offset topic in kafka.

@amirofmn looks like you have been able to find the offsets at __consumer_offsets, let us know by opening a new issue, if you run into further issues.

@chaitanyaphalak is there any explanation as to why the offsets are being written to __consumer_offsets and not the topic we specify within the connect-distributed.properties file? It feels like a bug to me.