PELT
Very simple high performance c based PELT (Prepare->Extract->Load->Transform) solution Warning, this is my first real c program and it is far far from perfect yet, there be bugs !
Why
99% of Prepare->Extract->Load->Transform work is database and file, and I struggled to see why I needed 1GB+ of application to do what can be done with an app of <1MB and some simple configuration. So not finding any suitable and maintained alternatives I started writing my own.
Performance
On a basic test case these are the evolution of current results 1.6M rows
# | Method | Source | Destination | Time |
---|---|---|---|---|
A | CREATE TABLE AS SELECT | PostgreSQL 11.2 | PostgreSQL 11.2 | 11.686s |
B | \COPY TO/FROM FILE | PostgreSQL 11.2 | PostgreSQL 11.2 | 19.809s |
1 | pelt with 1 commit | PostgreSQL 11.2 | PostgreSQL 11.2 | 35m7.348s |
2 | pelt with 10,000 commit | PostgreSQL 11.2 | PostgreSQL 11.2 | 6m55.410s |
3 | pelt with final commit | PostgreSQL 11.2 | PostgreSQL 11.2 | 6m52.138s |
4 | pelt prepared with final commit | PostgreSQL 11.2 | PostgreSQL 11.2 | 3m37.053s |
5 | pelt prepared, final async | PostgreSQL 11.2 | PostgreSQL 11.2 | 3m53.463s |
6 | pelt multi insert, prepared & final | PostgreSQL 11.2 | PostgreSQL 11.2 | 44.128s |
Supported Sources & Destinations
- PostgreSQL
- MySQL (TODO)
- CSV (TODO)
Usage
./pelt configuration.conf
Build
Yeah, I should move to configure and make eventually, but it is easy for now.
./build.sh
Requires
- libconfig
- libpq
- libmysql TODO
- (a CSV library) TODO
Execution Model
- Read configuration file
- Run destination->pre's
- Run source->pre's
- Run destination->key
- Run source->data => destination->data
- Run destination->post's
- Run source->post's
Example Config
Example application configuration
commit = 0;
notify = 10000;
source: {
conn: {
type = "postgresql"; // postgresql/mysql/csv
host = "localhost";
dbnm = "databaseA";
user = "";
pass = "";
port = 5432;
};
pre: (
"SELECT COUNT(*) FROM customer"
); // Array of SQL's or Script's to pre run
key = "customer_id"; // In source context this is the column name
data = "SELECT * FROM customer WHERE customer_id > $1 ORDER BY customer_id"; // SELECT statement, or File
post: (); // Array of SQL's or Script's to post run
};
destination: {
conn: {
type = "postgresql"; // postgresql/mysql/csv
host = "localhost";
dbnm = "databaseB";
user = "";
pass = "";
port = 5432;
};
pre: (
"SELECT COUNT(*) FROM customer_copy",
"TRUNCATE TABLE customer_copy"
); // Array of SQL's or Script's to pre run
key = "SELECT MAX(customer_id) FROM customer_copy"; // In destination context this is the key value
data = "INSERT INTO customer_copy VALUES"; // INSERT SQL prefix, or file
post: (
"SELECT COUNT(*) FROM customer_copy"
); // Array of SQL's or Script's to post run
};