celonis/kafka-ems-connector

Question: Scalability

Opened this issue · 0 comments

Hello, thank you for this great connector.

We are using this to push data from kafka to Celonis.
Data has primary keys that allows us to deduplicate via the connector feature when exporting to Celonis.
The connector is deployed in a standalone mode, and with basic configration (1 task, pkeys, some transform on dates)
Now we want to push huge table, around 30 Millions of records. With one instance of the connector, exports to celonis takes too much time (more than 4hours).

We tried to scale to 3 instances (we have kafka topics with 3 partitions), no error in logs (aside some connexion timeout to celonis from time to time). But when we checked our data in Celonis the count did not match. We had less than expected , deduplication tken into account.

Question is, what is the scalability status of this connector? Is there any specific point to watch over or any particular configuration to make?

Thank you!