snowflakedb/snowflake-kafka-connector

Ingesting real-time change data to Snowflake

ryzhyk opened this issue · 5 comments

ryzhyk commented

I am trying to figure out the best way to ingest real-time CDC data in Snowflake. I started reading about Snowpipe Streaming and discovered that it only supports insertions and requires staging tables and continuous pipelines to implement updates and deletions. I was wondering if there are better options that would avoid the cost, complexity, and the extra latency of staging tables.

Should I consider ODBC or the REST API or are they not well suited for high-throughput ingest?

Any pointers would be appreciated!

Which database are you trying to ingest CDC from?

We have 2 database connectors in preview that leverage snowpipe streaming for CDC

ryzhyk commented

@sfc-gh-xhuang , we are building our own database, which means that we can produce the CDC stream in any format. The question is, whether there is a way to ingest this stream in Snowflake without using intermediate tables.

What databases are you building plugins for?

I see. Snowpipe Streaming only supports insert today but it is on our longer term roadmap to support upserts. Either the REST API or ODBC would work well I think. REST API is somewhat newer and I think most CDC from partners is using ODBC or other language SDKs.

We have CDC support for Postgres and MySQL currently

ryzhyk commented

Interesting, thank you! It's great to know that upserts are on the roadmap!

Closing this one, let us know if you need further help.