igvog's Stars
apache/spark
Apache Spark - A unified analytics engine for large-scale data processing
pola-rs/polars
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
modularml/mojo
The Mojo Programming Language
prestodb/presto
The official home of the Presto distributed SQL query engine for big data
scala/scala
Scala 2 compiler and standard library. Scala 2 bugs at https://github.com/scala/bug; Scala 3 at https://github.com/scala/scala3
debezium/debezium
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
datahub-project/datahub
The Metadata Platform for your Data and AI Stack
StarRocks/starrocks
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
lauris/awesome-scala
A community driven list of useful Scala libraries, frameworks and software.
jupyter/docker-stacks
Ready-to-run Docker images containing Jupyter applications
apache/hudi
Upserts, Deletes And Incremental Processing on Big Data.
Eventual-Inc/Daft
Distributed data engine for Python/SQL designed for the cloud, powered by Rust
delta-io/delta-rs
A native Rust library for Delta Lake, with bindings into Python
teamclairvoyant/airflow-maintenance-dags
A series of DAGs/Workflows to help maintain the operation of Airflow
san089/Udacity-Data-Engineering-Projects
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
zinggAI/zingg
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
astronomer/astronomer-cosmos
Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
Azure-Samples/modern-data-warehouse-dataops
DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
ActivitySchema/ActivitySchema
Repository for the ActivitySchema spec and supporting materials
awslabs/amazon-kinesis-client-python
Amazon Kinesis Client Library for Python
delta-io/delta-examples
Delta Lake examples
bryzgaloff/airflow-clickhouse-plugin
The most popular ClickHouse plugin for Airflow. 🔝 Top-1% downloads on PyPI: https://pypi.org/project/airflow-clickhouse-plugin! Based on mymarilyn/clickhouse-driver.
ScalefreeCOM/datavault4dbt
Scalefree's dbt package for a Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt including the Staging Area, DV2.0 main entities, PITs and Snapshot Tables.
minio/openlake
Build Data Lake using Open Source tools
dremio/dbt-dremio
dbt (data build tool) adapter for the Dremio
isaaclucky/data-warehousing
Data warehouse tech stack with PostgreSQL, DBT and Airflow
semashkinvg/DataVault
aravinthsci/Spark_Delta_Lake
Delta Lake Examples
saboye/Data-Modeling-with-Postgres
A project to design a fact and dimension star schema for optimizing queries on a flight booking database using PostgreSQL, a relational database management system. This schema is well-suited for a flight booking database, as it allows for efficient querying of data such as booking dates, flight routes, and passenger information.
jiteshsoni/material_for_public_consumption