Local modern Data stack

Vision: Serve as a reference for how the local modern data stack can be used in practice. Over time build more production grade features and deployment modes to serve as a go-to example for others to spread the use of the software engineering best practices in the LMDS.

items will be checked:

  • to be done
  • done

if they are implemented

batch

the traditional transformations everyone is using

python dependencies with https://github.com/basnijholt/unidep

ingestion

plain transformation (SQL)

  • dagster
  • duckdb
  • dbt-duckdb
  • excel
  • s3/minio
  • delta lake
  • cube.dev
  • secrets in sops with age
  • quality checks

ML and imperative code

add

  • ray.io, dagaster-pipes
  • simple tabular AI sample
  • stateful quality checks (anomaly detection)

AI

add

streaming

  • dagster
  • duckdb
  • dbt-duckdb
  • s3/minio
  • kafka / redpanda
  • risingwave / starrocks
  • secrets in sops with age
  • some streaming dashboard solution

open points

to be discussed if we want to include them

further references

main

background

contributors

In Alphabetic order: