/cdc-apache-cassandra

Datastax CDC for Apache Cassandra

Primary LanguageJavaApache License 2.0Apache-2.0

DataStax CDC for Apache Cassandra

CI documentation release GitHub release

The DataStax CDC for Apache Cassandra requires:

  • DataStax Change Agent for Apache Cassandra, which is an event producer deployed as a JVM agent on each Cassandra data node.
  • DataStax Source Connector for Apache Pulsar, which is source connector deployed in your streaming platform.

Cassandra-source-connector

Supported streaming platform:

  • Apache Pulsar 2.8.1+
  • DataStax Luna Streaming 2.8.0.1.1.40+

Supported Cassandra version:

Note: Only Cassandra 4.0 and DSE 6.8.16+ support the near realtime CDC allowing to replicate data as soon as they are synced on disk.

Documentation

All documentation is available online here.

See the QUICKSTART.md page.

Demo

Cassandra data replicated to Elasticsearch:

  • Create a Cassandra table with cdc enabled
  • Deploy a Cassandra source and an Elasticsearch sink into Apache Pulsar
  • Writes into Cassandra are replicated to Elasticsearch.

asciicast

Monitoring

You can collect Cassandra/DSE and Pulsar metrics into Prometheus, and build a Grafana dashboard with:

  • The CQL read latency from the Cassandra Source Connector
  • The replication latency from the Cassandra Source Connector (computed from the Cassandra writetime)
  • The CDC disk space used in the cdc_raw directory (for DSE only)
  • The mutation sent throughput from a Cassandra node
  • The pulsar events and data topic rate in

CDC Dashboard

Limitations

  • Does not replay logged batches
  • Does not manage table truncates
  • Does not manage TTLs
  • Does not support range deletes
  • Does not sync data available before starting the CDC agent.
  • CQL column names must not match a Pulsar primitive type name (ex: INT32)
  • Does not support primary key only tables (ex: CREATE TABLE t (k int, c int, PRIMARY KEY (k, c)) WITH cdc=true;)

Supported data types

Cassandra supported CQL3 data types (with the associated AVRO type or logical-type):

  • text (string), ascii (string)
  • tinyint (int), smallint (int), int (int), bigint (long), double (double), float (float),
  • inet (string)
  • decimal (cql_decimal), varint (cql_varint), duration (cql_duration)
  • blob(bytes)
  • boolean (boolean)
  • timestamp (timestamp-millis), time (time-micros), date (date)
  • uuid, timeuuid (uuid)
  • User Defined Types (record)
  • Collection types: ** list (array) ** set (array) ** map (map)

Build from the sources

./gradlew assemble

Note: Artifacts for DSE agent are excluded by default. To build the agent-dse4 module, specify the dse4 property:

./gradlew assemble -Pdse4 

Acknowledgments

Apache Cassandra, Apache Pulsar, Cassandra and Pulsar are trademarks of the Apache Software Foundation. Elasticsearch, is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.