Sync data from MySQL to ClickHouse, support full and increment ETL.
- Full data etl and continuous sync.
- Support DDL and DML sync,current support
add column
anddrop column
of DDL, and full support of DML also. - Rich configurable items.
- kafka,message queue to store mysql binlog event.
- redis,cache mysql binlog file and position and store monitor data.
$ pip install mysql2ch
Example config.json.
Maybe you need make full data etl before continuous sync data from MySQL to ClickHouse or redo data etl with --renew
.
$ mysql2ch etl -h
usage: mysql2ch etl [-h] --schema SCHEMA [--tables TABLES] [--renew]
optional arguments:
-h, --help show this help message and exit
--schema SCHEMA Schema to full etl.
--tables TABLES Tables to full etl,multiple tables split with comma.
--renew Etl after try to drop the target tables.
Full etl from table test.test
:
$ mysql2ch -c config.json etl --schema test --tables test
Listen all MySQL binlog and produce to kafka.
$ mysql2ch -c config.json produce
Consume message from kafka and insert to ClickHouse,and you can skip error with --skip-error
.
$ mysql2ch consume -h
usage: mysql2ch consume [-h] --schema SCHEMA [--skip-error] [--auto-offset-reset AUTO_OFFSET_RESET]
optional arguments:
-h, --help show this help message and exit
--schema SCHEMA Schema to consume.
--skip-error Skip error rows.
--auto-offset-reset AUTO_OFFSET_RESET
Kafka auto offset reset,default earliest.
Consume schema test
and insert into ClickHouse
:
$ mysql2ch -c config.json consume --schema test
Example docker-compose.yml.
Sentry,error reporting,worked if set sentry_dsn
in config.json
.
When set True
, will display sql information.
Sentry
environment.
Sentry
dsn, set it if you use it.
Sync config, with schema as key, tables list and kafka_partition, one kafka partition transfer one schema's binlog.
Initial mysql binlog file, set first and will read from redis later.
Initial mysql binlog position, set first and will read from redis later.
Redis stored prefix.
This tables skip delete of dml.
This tables skip update of dml.
Skip delete or update of dml.
How many events per submit.
How many seconds per submit.
This project is licensed under the MIT License.