Datadex is a proof of concept project to explore how people could model Open Tabular Datasets using SQL. Thanks to dbt and DuckDB you can transform data by simply writing select statements or import someone else's model and build on it!
- Open. Run it from your laptop, the browser, EC2...!
- Data as code. Version your datasets as dbt packages!
- Package management. Publish and share your models for other people to build on top of them!
This is an example of how you can use Datadex to model data. Is already configured with some sample datasets. Get things working end to end with the following steps:
- Setup dependencies with
make deps
. - Build your dbt models and save them to Parquet files with
make run
. - Explore the data with
make rill
.
- Model local and remote datasets (
csv
orparquet
) withdbt
. - Use any of the other awesome
dbt
features liketests
anddocs
. Docs are automatically generated and published on GitHub Pages.
The fastest way to start using Datadex is via VSCode Remote Containers. Once inside the develpment environment, you'll only need to run make deps
.
PS: The development environment can also run in your browser thanks to GitHub Codespaces.
This small project was created after thinking how an Open Data Protocol could look like! I just wanted to stitch together a few open source technologies and see what could they do.
- This proof of concept was created thanks to open source projects like DuckDB and dbt.
- Datadex name was inspired by Juan Benet awesome
data
projects.