/local-data-stack

Simple data analytics and machine learning stack that runs on your local machine

Primary LanguagePythonMIT LicenseMIT

Local Data Stack

This is an simple data analytics or machine learning stack that runs on your local machine.

The docker-compose is consisted of:

  1. Airflow
  2. Superset
  3. Postgres

The dags directory has some nice examples to show you how you can use this repo to bootstrap data analytics or machine learning projects quickly on your local machine.

Quick Start

Reminders: You'll need docker and docker-compose.

Simply clone the repo, cd into repo directory and run docker-compose up -d

git clone git@github.com:l1990790120/local-data-stack.git
cd local-data-stack
docker-compose up -d

Suppose things are working as expected, you should see

Note: For superset, you will have to initialize database for the first time, details here.

docker exec -it local-data-stack_superset_1 superset-init

Technical Details

The Postgres is used as backend for both Airflow (under user airflow) and Superset (under user superset). Data are loaded into database analytics (under user postgres).

In Superset, you can add database with sql connection string: postgresql://postgres@postgres:5432/analytics.

Or, if you just want to run some sql queries, exec into the docker container

docker exec -it local-data-stack_postgres_1 bash

Within Postgres container, run

psql -U postgres -d analytics

And you'll see all the data you've loaded with Airflow.

Example Dags

I have put some interesting data pipeline dags directory. I'll continue to add more as I come across. Feel free to contribute as well.