iomete: JDBC Sync

This library provides easily replicate tables from JDBC databases (MySQL, PostgreSQL, etc.) to iomete

Note: It requires you have SSH Tunnel between iomete if your database in a private network (see: https://docs.iomete.com/docs/database-connection-options)

Sync mode

You can define sync mode for each table. Currently, supported sync modes are:

FullLoad: Read everything in the source and overwrites whole table at the destination at each sync
IncrementalSnapshot: It creates the snapshot of table in the destination and only move the newly inserted and updated records. While writing to iomete it uses merge statement. This mode requires 2 parameters: identification_column will be used on merge statement, and tracking_column to track the where it should continue to get data from the source table

Deployment

Go to Spark Jobs.
Click on Create New.

Specify the following parameters (these are examples, you can change them based on your preference):

Name: jdbc-sync-job
Schedule: 0 0/22 1/1 * *
Docker Image: iomete/iomete_jdbc_sync:0.3.0
Main application file: local:///app/driver.py
Environment Variables: DB_PASSWORD: 9tVDVEKp
Config file:

{
    source_connection: {
        type: mysql,
        host: "iomete-tutorial.cetmtjnompsh.eu-central-1.rds.amazonaws.com",
        port: 3306,
        username: tutorial_user,
        password: "9tVDVEKp"
    },
    syncs: [
        {
            source.schema: employees
            source.tables: ["*"]
            source.exclude_tables: [departments, dept_manager, salaries]
            destination.schema: employees_raw
            sync_mode.type: full_load
        },
        {
            source.schema: employees
            source.tables: [ departments, dept_manager ]
            destination.schema: employees_raw
            sync_mode.type: full_load
        },
        {
            source.schema: employees
            source.tables: [
                """
                (SELECT emp_no, sum(salary) total_salary FROM salaries group by emp_no)
                as total_salaries
                """
            ]
            destination.schema: employees_raw
            sync_mode.type: full_load
        }
    ]
}

Configuration properties

Property Description

source_connection

Required properties to connect source.

type your database type like Mysql, Postgres, etc.
host your database host. Example: iomete-tutorial.cetmtjnompsh.eu-central-1.rds.amazonaws.com
port port number. Example: 3306
username database username
password database password

syncs

`source.schema`	Database where your tables stored.
`source.tables`	List of tables which you want to replicate. If you want to move all tables set `["*"]`
`source.exclude_tables`	List of tables which you do not want to replicate. (Optional)
`destination.schema`	Database name where you want to store tables in your warehouse.
`type`	`full_load` overwrites whole table at the destination at each sync or `incremental_snapshot` only move the newly inserted and updated records. `full_load` `incremental_snapshot` `identification_column` `tracking_column`

Create Spark Job

Create Spark Job - Instance

You can use Environment Variables to store your sensitive data like password, secrets, etc. Then you can use these variables in your config file using the ${DB_PASSWORD} syntax.

Create Spark Job - Application Config

And, hit the create button.

The job will be run based on the defined schedule. But, you can trigger the job manually by clicking on the Run button.

Development

Prepare the dev environment

virtualenv .env #or python3 -m venv .env
source .env/bin/activate

pip install -e ."[dev]"

Run test

pytest

iomete/iomete-jdbc-sync

iomete: JDBC Sync

Sync mode

Deployment

Configuration properties

Development