/airflow-summit-airbyte-2022

git push your data stack with Airbyte, Airflow, and dbt - 2022 Airflow Summit

Primary LanguageShell

git push your data stack with Airbyte, Airflow and dbt

For Airflow Summit 2022, by @marcosmarcm and @evantahler from Airbyte

CI

What:

Links:

This project configures a sample data stack orchestrated by Airflow, using Airbyte to Extract and Load data, and dbt to Transform it.

Running Locally:

  1. Install Docker Desktop and Python 3 (if you are on MacOS, you already have Python 3).

  2. Create {HOME}/.octavia and add the following credentials for using a local postgres database managed by Docker:

POSTGRES_HOST=host.docker.internal
POSTGRES_PASSWORD=password
POSTGRES_USERNAME=demo_user
POSTGRES_DATABASE=postgres
  1. Create the profile dbt in {HOME}/.dbt/profiles.yaml
config:
  partial_parse: true
  printer_width: 120
  send_anonymous_usage_stats: false
  use_colors: true
normalize:
  outputs:
    prod:
      dbname: postgres
      host: host.docker.internal
      pass: password
      port: 5432
      schema: public
      threads: 8
      type: postgres
      user: demo_user
  target: prod
  1. Run the whole data stack using ./tools/start.sh. This will install local requirements (PyYAML) and run everything though Docker. The script will exit when complete, but the Docker containers will remain running.

In your browser:

  • Visit http://localhost:8080/ to see the Airflow UI (user: airflow, password: airflow) and your completed DAG.
  • Visit http://localhost:8000/ to see the Airbyte UI and your completed Sync.
  • Visit your local postgres database (localhost:5432) with the username=demo_user and password=password to see the staged and transformed data.

Shut it down

Run ./tools/stop.sh to stop the Docker containers.

Testing

This repository is tested using Github Actions.