Local Data Stack
This is an simple data analytics or machine learning stack that runs on your local machine.
The docker-compose
is consisted of:
The dags
directory has some nice examples to show you how you can use this
repo to bootstrap data analytics or machine learning projects quickly on your
local machine.
Quick Start
Reminders: You'll need docker
and docker-compose
.
Simply clone the repo, cd
into repo directory and run docker-compose up -d
git clone git@github.com:l1990790120/local-data-stack.git
cd local-data-stack
docker-compose up -d
Suppose things are working as expected, you should see
- Airflow: http://localhost:8080/
- Superset: http://localhost:8088/
Note: For superset, you will have to initialize database for the first time, details here.
docker exec -it local-data-stack_superset_1 superset-init
Technical Details
The Postgres is used as backend for both Airflow (under user airflow) and Superset (under user superset). Data are loaded into database analytics
(under user postgres).
In Superset, you can add database with sql connection string: postgresql://postgres@postgres:5432/analytics
.
Or, if you just want to run some sql queries, exec into the docker container
docker exec -it local-data-stack_postgres_1 bash
Within Postgres container, run
psql -U postgres -d analytics
And you'll see all the data you've loaded with Airflow.
Example Dags
I have put some interesting data pipeline dags
directory. I'll continue to add more as I come across. Feel free to contribute as well.
- JHU Covid Data
- More to come