Change Data Capture (CDC) with PostgreSQL and ClickHouse.
This project sets up a real-time data pipeline utilizing Change Data Capture (CDC) to stream changes from a PostgreSQL database to a ClickHouse database through Apache Kafka. Using Debezium to capture and stream database changes, and Kafka Connect to sink the data into ClickHouse, this pipeline allows for efficient and reliable data synchronization and analytics.
- Docker
- Docker Compose
- Git
The project consists of the following services:
- Postgres
- ClickHouse
- Kafka
- Zookeeper
- Kafka-Connect
- Kowl (Apache Kafka UI)
- Adminer (PostgreSQL UI)
git clone https://github.com/leartbeqiraj1/cdc-postgresql-clickhouse.git
cd cdc-postgresql-clickhouse/
docker-compose up -d
docker compose ps
NAME SERVICE STATUS PORTS
postgres postgres running 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp
clickhouse-server clickhouse running 9000/tcp, 0.0.0.0:8123->8123/tcp, :::8123->8123/tcp, 9009/tcp
kafka kafka running 0.0.0.0:9092->9092/tcp, :::9092->9092/tcp, 0.0.0.0:9101->9101/tcp, :::9101->9101/tcp
zookeeper zookeeper running 2888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 3888/tcp
kafka-connect kafka-connect running (healthy) 0.0.0.0:8083->8083/tcp, :::8083->8083/tcp, 9092/tcp
kowl kowl running 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
adminer adminer running 0.0.0.0:7775->8080/tcp, :::7775->8080/tcp
Login to Postgres UI and verify that there are demo_table1 and demo_table2 with 2 rows inserted each. Username and Password are both "postgres"
Open Kowl topics and verify that Debezium Connector has sucessfully published PostgreSQL data to a Kafka topic. There should be a topic named demo_data and it should contain 4 messages inside (all 4 rows from PostgreSQL tables).
Navigate to Consumer Groups to verify that ClickHouseSink Connector is a stable consumer, and it has successfully subscribed to the demo_data topic.
Open ClickHouse UI and verify that PostgreSQL data are already pushed to ClickHouse from ClickHouseSink Connector by executing below:
SELECT * FROM demo_table1_mv FINAL;
SELECT * FROM demo_table2_mv FINAL;
Go to Postgres UI and insert or update existing rows.
Open ClickHouse UI again and you should see your changes already applied.
Inspired by a blog post on ClickHouse's official website (ClickHouse PostgreSQL Change Data Capture (CDC) - Part 1), I decided to replicate the setup. Due to the absence of extensive blogs, posts, or documentation on how to set up this particular pipeline locally, I was motivated to create this documentation. Through this effort, I hope to contribute to the community's shared knowledge and facilitate a smoother setup process for this CDC pipeline.