This is a simple project using DuckDB & dbt.
The repo contains two models based on the WHO air quality dataset that is hosted on a public S3 bucket as a parquet file.
The dbt
pipelines output two CSVs in output/
folder. While the bucket is public, you would be required to setup S3_ACCESS_KEY_ID
S3_SECRET_ACCESS_KEY
environment variable (can be dummy values) to run the pipeline.
This project use the dbt-duckdb adapter for DuckDB.
You can install it by doing pip install dbt-duckdb
.
This include dbt
, dbt-duckdb
adapter and duckdb
.
There's a devcontainer you can use for either local developement or through GitHub Codespace.
Inside the dbt project /dbt_demo
, run dbt run