/mysql2ch

python同步mysql到clickhouse Sync data from MySQL to ClickHouse

Primary LanguagePythonMIT LicenseMIT

mysql2ch

https://img.shields.io/pypi/v/mysql2ch.svg?style=flat https://img.shields.io/docker/cloud/build/long2ice/mysql2ch https://img.shields.io/github/license/long2ice/mysql2ch

中文文档

Introduction

Sync data from MySQL to ClickHouse, support full and increment ETL.

https://github.com/long2ice/mysql2ch/raw/master/images/mysql2ch.png

Features

  • Full data etl and continuous sync.
  • Support DDL and DML sync,current support add column and drop column of DDL, and full support of DML also.
  • Rich configurable items.

Requirements

  • kafka,message queue to store mysql binlog event.
  • redis,cache mysql binlog file and position and store monitor data.

Install

$ pip install mysql2ch

Usage

Config

Example config.json.

Full data etl

Maybe you need make full data etl before continuous sync data from MySQL to ClickHouse or redo data etl with --renew.

$ mysql2ch etl -h

usage: mysql2ch etl [-h] --schema SCHEMA [--tables TABLES] [--renew]

optional arguments:
  -h, --help       show this help message and exit
  --schema SCHEMA  Schema to full etl.
  --tables TABLES  Tables to full etl,multiple tables split with comma.
  --renew          Etl after try to drop the target tables.

Full etl from table test.test:

$ mysql2ch -c config.json etl --schema test --tables test

Produce

Listen all MySQL binlog and produce to kafka.

$ mysql2ch -c config.json produce

Consume

Consume message from kafka and insert to ClickHouse,and you can skip error with --skip-error.

$ mysql2ch consume -h

usage: mysql2ch consume [-h] --schema SCHEMA [--skip-error] [--auto-offset-reset AUTO_OFFSET_RESET]

optional arguments:
  -h, --help            show this help message and exit
  --schema SCHEMA       Schema to consume.
  --skip-error          Skip error rows.
  --auto-offset-reset AUTO_OFFSET_RESET
                        Kafka auto offset reset,default earliest.

Consume schema test and insert into ClickHouse:

$ mysql2ch -c config.json consume --schema test

Use docker-compose(recommended)

Example docker-compose.yml.

Optional

Sentry,error reporting,worked if set sentry_dsn in config.json.

Config Reference

debug

When set True, will display sql information.

environment

Sentry environment.

sentry_dsn

Sentry dsn, set it if you use it.

schema_table

Sync config, with schema as key, tables list and kafka_partition, one kafka partition transfer one schema's binlog.

init_binlog_file

Initial mysql binlog file, set first and will read from redis later.

init_binlog_pos

Initial mysql binlog position, set first and will read from redis later.

log_pos_prefix

Redis stored prefix.

skip_delete_tables

This tables skip delete of dml.

skip_update_tables

This tables skip update of dml.

skip_dmls

Skip delete or update of dml.

insert_num

How many events per submit.

insert_interval

How many seconds per submit.

License

This project is licensed under the MIT License.