yugabyte/yugabyte-db

[Master]Add Change Data Capture (CDC) APIs to stream data out of YugabyteDB

suranjan opened this issue · 2 comments

Motivation

  • Without Change Data Capture (CDC), database extraction is a cumbersome process in which you move the entire contents of tables into flat files, and then load the files into the data warehouse. This ad hoc approach is expensive in several ways.
  • Without CDC, for staging, the entire contents of tables are moved into flat files and interfaces become error-prone and manpower intensive to administer
  • Without CDC, It becomes expensive because you must write and maintain the capture software yourself, or purchase it from a third-party vendor.
  • So, we need an efficient, distributed, row-level change data capture (CDC) feed into a configurable sink for downstream processing such as reporting,full-text indexes, analytics engines, or big data pipelines.
  • Applications can use change streams to subscribe to all data changes on a single table, a database, or an entire deployment, and immediately react to them.

Phase 1

Status Subtask GitHub Issue Estimated Time
Implement the CDC Lifecycle API
Implement the GetChanges method of CDC API #9022
Define the CDCEvent Structure #9020
Develop Simple Console Client #9021
Support Snapshot of the table before the start of the CDC
Allow DDL changes to be propagated
🛠 Build a Kafka Source Connector (Debezium) #11855
⬜️ Support reading the 'before image' of a change

Phase 2

Status Subtask GitHub Issue
⬜️ Remove dependency on 'Kafka'
⬜️ Support UDT datatype for CDC
⬜️ Support Row Level Security
⬜️ Support Metrics for tracking CDC state

The following issues are also being tracked and are under our plan for future releases:

  • Native CDC support without Debezium - #11856
  • CDC push to Kafka - #11857
  • Push to webhooks - #11858
  • OLAP integration of CDC (Snowflake, BigQuery, etc) -- #11859
  • Object store integration (S3, Minio, etc) --> #11860
  • Message bus integration (PubSub, Kinesis, etc) --> #11861

I'm really looking forward to this, especially the possibility of supporting different 'sinks' or 'connectors'!

As already stated in #2513 a connector for NATS Jetstream would be awesome.

gedw99 commented

Going to add my vote for nats. It’s just much more flexible than Kafka.

and way easier to manage