delta-lake
There are 140 repositories under delta-lake topic.
apache/doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
trinodb/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
StarRocks/starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
delta-io/delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
roapi/roapi
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
delta-io/delta-rs
A native Rust library for Delta Lake, with bindings into Python
databricks/LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
apache/incubator-xtable
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
delta-io/delta-sharing
An open protocol for secure data sharing
delta-io/connectors
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
splitgraph/seafowl
Analytical database for data-driven Web applications 🪶
aws-samples/amazon-sagemaker-local-mode
Amazon SageMaker Local Mode Examples
adidas/lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
japila-books/delta-lake-internals
The Internals of Delta Lake
josephmachado/data_engineering_best_practices
Sample project to demonstrate data engineering best practices
izhangzhihao/Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
delta-incubator/delta-sharing-rs
A Minimalistic Rust Implementation of Delta Sharing Server.
tikal-fuseday/delta-architecture
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
lhbench/lhbench
Lakehouse storage system benchmark
neylsoncrepalde/edc-mod1-exercise-igti
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
jeppe742/DeltaLakeReader
Read Delta tables without any Spark
dacort/modern-data-lake-storage-layers
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
dask-contrib/dask-deltatable
A Delta Lake reader for Dask
anneglienke/101_upsert-delta
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
TatevKaren/free-resources-books-papers
Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.
databrickslabs/delta-oms
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-rc1 release.Documentation here - https://databrickslabs.github.io/delta-oms/v0.7.0-rc1/
Nike-Inc/koheesio
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
csimplestring/delta-go
Native Delta Lake Implementation in Go
ysfesr/Building-Data-LakeHouse
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
xuwenyihust/DataPulse
DataPulse is a platform for developers to build, schedule and monitor data pipelines.
guidok91/spark-movies-etl
Spark data pipeline that processes movie ratings data.
AndrewKuzmin/spark-structured-streaming-examples
Spark structured streaming examples with using of version 3.5.1
jaehyeon-kim/dbt-on-aws
dbt (data build tool) projects targeting AWS analytics services (redshift, glue, emr, athena) and open table formats
vvalcristina/Workshop-Data-Lakehouse
Repositório dedicado a Workshop de Data Lakehouse com Delta Lake
handreassa/delta-docker
Template to spin up delta lake locally using docker
apache/doris-streamloader
Stream Loader for Apache Doris