Imbruced's Stars
astral-sh/uv
An extremely fast Python package and project manager, written in Rust.
OpenLineage/OpenLineage
An Open Standard for lineage metadata collection
apache/pinot
Apache Pinot - A realtime distributed OLAP datastore
codecrafters-io/build-your-own-x
Master programming by recreating your favorite technologies from scratch.
fiffeek/beeflow
Serverless Airflow on AWS
pingcap/awesome-database-learning
A list of learning materials to understand databases internals
apache/sedona
A cluster computing framework for processing large-scale geospatial data
apache/paimon
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
kraina-ai/quackosm
QuackOSM: an open-source Python and CLI tool for reading OpenStreetMap PBF files using DuckDB
opengeospatial/geoparquet
Specification for storing geospatial vector data (point, line, polygon) in Parquet
papers-we-love/papers-we-love
Papers from the computer science community to read and discuss.
ThreeDotsLabs/watermill
Building event-driven applications the easy way in Go.
superstreamlabs/memphis
Memphis.dev is a highly scalable and effortless data streaming platform
getindata/awesome-getindata-recommended-sources
A curated list of links to sources of latest updates in data/ml/ai
TheAlgorithms/Python
All Algorithms implemented in Python
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
getindata/dbt-airflow-factory
Library to convert DBT manifest metadata to Airflow tasks
tidwall/tile38
Real-time Geospatial and Geofencing
getindata/data-pipelines-cli
CLI for data platform
bartosz25/spark-playground
Code snippets used in demos recorded for the blog.
dbt-labs/dbt-core
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
great-expectations/great_expectations
Always know what to expect from your data.
datahub-project/datahub
The Metadata Platform for your Data Stack
tensorflow/tensorflow
An Open Source Machine Learning Framework for Everyone
allegro/turnilo
Business intelligence, data exploration and visualization web application for Druid, formerly known as Swiv and Pivot
nurkiewicz/reactor-workshop
Spring Reactor hands-on training (3 days)
trinodb/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
ververica/flink-sql-cookbook
The Apache Flink SQL Cookbook is a curated collection of examples, patterns, and use cases of Apache Flink SQL. Many of the recipes are completely self-contained and can be run in Ververica Platform as is.
prestodb/presto
The official home of the Presto distributed SQL query engine for big data