This Meerschaum Compose project accompanies my talk The Wonderful World of Incremental Time-Series ETL and demonstrates practical syncing strategies I use daily.
To demonstrate secrets management, create a file .env
and paste these variables:
export MRSM_SQL_ETL='{
"flavor": "timescaledb",
"username": "mrsm",
"password": "mrsm",
"database": "meerschaum",
"port": 5432,
"host": "localhost"
}'
export MRSM_MONGODB_LOCAL='{
"uri": "mongodb://localhost:27017",
"database": "etl"
}'
This defines two connectors:
-
sql:etl
These are the default credentials forsql:main
, but to demonstrate, we've aliased it assql:etl
. To start this database, installmeerschaum
on your host machine and runmrsm stack up -d db
-
mongodb:local
A servicemongodb
is added to this project'sdocker-compose.yaml
to represent a heterogenous database fleet that you'll encounter in the wild.docker compose up -d mongodb
Build your image, start the container, and exec into it.
docker compose build
docker compose up -d
docker compose exec mrsm-compose bash
Like docker-compose.yaml
, the file mrsm-compose.yaml
defines our project state.
Let's examine our project:
mrsm compose explain
This project contains five pipes:
Pipe('plugin:noaa', 'weather', 'sc')
Sync raw weather data from SC stations into PostgreSQL.Pipe('plugin:noaa', 'weather', 'nc')
Sync raw weather data from NC stations into MongoDB.Pipe('plugin:clone', 'weather')
Consolidate these parent pipes together into PostgreSQL.Pipe('sql:etl', 'weather', 'fahrenheit')
Perform basic ETL on the combined weather data, converting temperature into Fahrenheit and only keeping desired columns.Pipe('sql:etl', 'weather', 'avg')
Chain additional ETL onto the previous table, calculating the daily average temperature.
In addition to the connectors we defined in .env
, some connectors are plugins, namely the public plugins noaa
and clone
.
Run the intial syncs on these pipes, one-at-a-time, like so:
mrsm compose run
Now your tables should be built and ready to go! Next, let's set up automatic syncs so our tables will be updated as soon as new data are available:
mrsm compose up
Now the pipes are syncing in the background, sleeping every 30 seconds as set by min_seconds
.
Like Docker Compose, stop the jobs like this (similarly, adding -v
will also delete the pipes):
mrsm compose down
There's the beginning of another project in the file carolina-compose.yaml
. It contains one pipe with the connector plugin:fake
, which you can find in the plugins/
directory (mounted as /app/plugins
).
Examine fake.py
and example/example_connector.py
.
Can you write your own plugin and build your own pipes? See the Writing Your Own Plugins guide for reference!