Welcome 👋 !
This is a small toy repository for you to test different ways of connection Airflow, DuckDB and MotherDuck. 🦆
If you are new to Airflow consider checking out our quickstart repository and Get started tutorial.
-
Make sure you have Docker Desktop installed and running.
-
Install the Astro CLI.
-
Clone this repository.
-
Create a
.env
file with the contents of the provided.env.example
file. If you are using MotherDuck, provide your MotherDuck token. -
Start Airflow by running
astro dev start
. -
In the Airflow UI define the following Airflow connections:
my_local_duckdb_conn
with the following parameters:- Conn Type:
duckdb
- Path to local database file:
include/my_garden_ducks.db
- Conn Type:
my_motherduck_conn
with the following parameters:- Conn Type:
duckdb
- MotherDuck Service token: your MotherDuck Service token
- MotherDuck database name: optionally specify a MotherDuck database name
- Conn Type:
You can double check your connection credentials using the
include/test_script.py
script. To run the script inside of the Airflow scheduler container runastro dev bash -s
and thenpython include/test_script.py
. -
Manually trigger DAGs by clicking the play button for each DAG on the right side of the screen.
This repo contains 4 DAGs showing different ways to interact with DuckDB and MotherDuck from within Airflow:
duckdb_in_taskflow
: This DAG uses theduckdb
Python package directly to connect. Note that some tasks will fail if no MotherDuck token was provided.duckdb_provider_example
: This DAG uses the DuckDBHook from the DuckDB Airflow provider to connect to DuckDB and MotherDuck.duckdb_custom_operator_example
: This DAG uses the custom local operatorExcelToDuckDBOperator
which is stored ininclude/duckdb_operator.py
to load the contents of an Excel file (include/ducks_in_the_pond
) into a DuckDB or MotherDuck database.duckdb_and_astro_sdk_example
: This DAG uses the Astro SDK to connect to perform a simple ELT pipeline either a local DuckDB database or a MotherDuck database.