tapdata-1: A Java repository from tjworks

Special Event: Connector Plugin Development Competition

Got some Java & Database skill? Help tapdata to add a data source and win prize!

https://github.com/tapdata/tapdata.github.io/blob/main/plugin-contributor-program/plugin-contributor-program.md

Online Document: https://tapdata.github.io/

What is Tapdata?

Tapdata is a live data platform designed to connect data silos and provide fresh data to the downstream operational applications & operational analytics.

Env Prepare

Please make sure you have Docker installed on your machine before you get starated.
Currently we only tested on linux OS(No specific flavor requirement).
clone repo: git clone https://github.com/tapdata/tapdata.git && cd tapdata

Last Release Branch

release-v2.9

Quick Use

This is the easiest way to experiment Tapdata:

run bash build/quick-use.sh will pull docker image and start an all-inone container

Quick Build

Alternatively, you may build the project using following command:

run bash build/quick-dev.sh will build a docker image from source and start a all in one container

If you want to build in docker, please install docker and set build/env.sh tapdata_build_env to "docker" (default)

If you want to build in local, please install:

JDK
maven set build/env.sh tapdata_build_env to "local"

run bash build/clean.sh If you want to clean build target

Quick Steps

If everything is ok, now you should be in a terminal window, follow next steps, have a try!

Create New DataSource

# 1. mongodb
source = DataSource("mongodb", "$name").uri("$uri")

# 2. mysql
source = DataSource("mysql", "$name").host("$host").port($port).username("$username").port($port).db("$db")

# 3. pg
source = DataSource("postgres", "$name").host("$host").port($port).username("$username").port($port).db("$db").schema("$schema").logPluginName("wal2json")

# save will check all config, and load schema from source
source.save()

Preview Table

use $name will switch datasource context
show tables will display all tables in current datasource
desc $table_name will display table schema

Migrate A Table

migrate job is real time default

# 1. create a pipeline
p = Pipeline("$name")

# 2. use readFrom and writeTo describe a migrate job
p.readFrom("$source_name.$table").write("$sink_name.$table")

# 3. start job
p.start()

# 4. monitor job
p.monitor()
p.logs()

# 5. stop job
p.stop()

Migrate Aable With UDF

No record schema change support in current version, will support in few days

If you want to change record schema, please use mongodb as sink

# 1. define a python function
def fn(record):
    record["x"] = 1
    return record

# 2. using processor between source and target
p.readFrom(...).processor(fn).writeTo(...)

Migrate Multi Tables