mmolimar/kafka-connect-fs

Batching Support

Symbianx opened this issue · 4 comments

Hello,

We're using this to restore messages from S3 and it's working great for 100k messages. However, we noticed the connector will read all the S3 files before starting to send the messages to the topics. We're afraid of running into memory limitations when we reach higher numbers of messages.

I was wondering if it would be possible to add batching support to this connector?

Hi!
Yes, it's a good point and the idea is to have that feature in the next minor release.

Thanks for your comments!

I started something on my fork by creating a new SimpleBatchPolicy that behaves like the SimplePolicy with a config for the number of files to return per execution call (handled in an Iterator).

Still need to get some tests done but wouldn't mind doing the PR for it. Does this align with the plans for the feature?

Sure! Contribs are very welcome!
Anyway, I'd think in something more generic to apply to all policies/file readers.

Created the PR #59. Since this is related to batching of files (not messages) I think it should be a functionality in policies instead of file readers.