databricks/iceberg-kafka-connect

Question: One control topic per connector?

Closed this issue · 3 comments

Hey @bryanck,

Are there any recommendations, perhaps requirements, for the control topic? Particularly interested in the relationship, whether it should be one-to-one - one control topic per one source topic/connector, or it is okay to have like one-to-many. I recently came across the following article:
https://docs.redpanda.com/current/deploy/deployment-option/cloud/managed-connectors/create-iceberg-sink-connector/#limitations, which explicitly states "Each Iceberg Sink connector must have its own control topic", though I've never had it like this in my mind and haven't found anything like it mentioned in the documentation here. Still, to some extent I can understand the drivers to keep it as a one-to-one, would like to hear your thoughts on this.

Thanks in advance!

Multiple sink connectors can use the same control topic if desired. Control messages are filtered within a connector based on the Connect consumer group ID.

You do not need to have a control topic per connector. These are low traffic relative to your source topics until you are at a significant scale (many topics with many partitions per topic).

As the number of topics (and topics w/ large partitions) increases, the more traffic in these topics. When you have one control topic, ALL connectors are reading from that topic and filtering out for just the messages they need. As scale increases, the consumer may be consuming 99% junk that gets filtered out.

Cons of one control topic:

  • consumers filter out most messages, especially at scale.
  • disaster recovery is more challenging --all messages in one place. If you need to mess around in there manually, remove bad messages, or who knows in some sort of disaster situation, it's a lot harder since all the messages are mixed up between the topics.

I don't think these are particularly huge cons and I would run with a single control topic. I might rethink this if I was running 500+ connectors, probably for the blast radius problem.

Thx guys, appreciate your responsiveness. Yeah, basically my understanding is the same, haven't encountered any significant problems so far using it at quite a different scale. I guess I got what I needed to double check here, hence closing the ticket.