Proof of concept project for modern open-source data stack.
The project is focused on data transformation and not data ingestion.
- Minio serves as a place to store data.
- Data is stored using Iceberg format providing OLTP-style features for OLAP query engines.
- Data is queried using Trino.
- dbt is used to perform data transformation.
- Airflow provides scheduling for running dbt data models.
The project is packaged using Docker.
The project needs at least 8 GB of memory and a mounted docker
directory with read-only access to run. Make sure to setup your Docker VM accordingly.
Example using colima
(executed from the project workspace):
colima start --cpu 4 --memory 8 --disk 10 --mount (pwd)/docker:r