/datadex

💾 Bring the modern data stack to Open Data!

Primary LanguageDockerfileMIT LicenseMIT

Logo Logo

Model Open Data collaboratively using dbt and DuckDB

What is Datadex?

Datadex is a proof of concept project to explore how people could model Open Tabular Datasets using SQL. Thanks to dbt and DuckDB you can transform data by simply writing select statements or import someone else's model and build on it!

Features

  • Open. Run it from your laptop, the browser, EC2...!
  • Data as code. Version your datasets as dbt packages!
  • Package management. Publish and share your models for other people to build on top of them!

Usage

This is an example of how you can use Datadex to model data. Is already configured with some sample datasets. Get things working end to end with the following steps:

  1. Setup dependencies with make deps.
  2. Build your dbt models and save them to Parquet files with make run.
  3. Explore the data with make rill.

rill

What can you do with Datadex?

docs

Setup

The fastest way to start using Datadex is via VSCode Remote Containers. Once inside the develpment environment, you'll only need to run make deps.

PS: The development environment can also run in your browser thanks to GitHub Codespaces.

Motivation

This small project was created after thinking how an Open Data Protocol could look like! I just wanted to stitch together a few open source technologies and see what could they do.

Acknowledgements