[Master]Add Change Data Capture (CDC) APIs to stream data out of YugabyteDB
suranjan opened this issue · 2 comments
suranjan commented
Motivation
- Without Change Data Capture (CDC), database extraction is a cumbersome process in which you move the entire contents of tables into flat files, and then load the files into the data warehouse. This ad hoc approach is expensive in several ways.
- Without CDC, for staging, the entire contents of tables are moved into flat files and interfaces become error-prone and manpower intensive to administer
- Without CDC, It becomes expensive because you must write and maintain the capture software yourself, or purchase it from a third-party vendor.
- So, we need an efficient, distributed, row-level change data capture (CDC) feed into a configurable sink for downstream processing such as reporting,full-text indexes, analytics engines, or big data pipelines.
- Applications can use change streams to subscribe to all data changes on a single table, a database, or an entire deployment, and immediately react to them.
Phase 1
Status | Subtask | GitHub Issue | Estimated Time |
---|---|---|---|
✅ | Implement the CDC Lifecycle API | ||
✅ | Implement the GetChanges method of CDC API | #9022 | |
✅ | Define the CDCEvent Structure | #9020 | |
✅ | Develop Simple Console Client | #9021 | |
✅ | Support Snapshot of the table before the start of the CDC | ||
✅ | Allow DDL changes to be propagated | ||
🛠 | Build a Kafka Source Connector (Debezium) | #11855 | |
⬜️ | Support reading the 'before image' of a change |
Phase 2
Status | Subtask | GitHub Issue |
---|---|---|
⬜️ | Remove dependency on 'Kafka' | |
⬜️ | Support UDT datatype for CDC | |
⬜️ | Support Row Level Security | |
⬜️ | Support Metrics for tracking CDC state |
The following issues are also being tracked and are under our plan for future releases:
ma-hartma commented
I'm really looking forward to this, especially the possibility of supporting different 'sinks' or 'connectors'!
As already stated in #2513 a connector for NATS Jetstream would be awesome.
gedw99 commented
Going to add my vote for nats. It’s just much more flexible than Kafka.
and way easier to manage