lakehouse
There are 115 repositories under lakehouse topic.
ClickHouse/ClickHouse
ClickHouseยฎ is a real-time analytics database management system
prestodb/presto
The official home of the Presto distributed SQL query engine for big data
apache/doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
StarRocks/starrocks
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
databendlabs/databend
๐๐-๐ก๐ฎ๐๐ถ๐๐ฒ ๐๐ฎ๐๐ฎ ๐ช๐ฎ๐ฟ๐ฒ๐ต๐ผ๐๐๐ฒ. Open-source Snowflake alternative. Proven at petabyte scale with enterprise performance. Built for multimodal analytics. https://databend.com
lakesoul-io/LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
ByConity/ByConity
ByConity is an open source cloud data warehouse
ytsaurus/ytsaurus
YTsaurus is a scalable and fault-tolerant open-source big data platform.
apache/gravitino
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
Mooncake-Labs/pg_mooncake
Real-time analytics on Postgres tables
apache/amoro
Apache Amoro(incubating) is a Lakehouse management system built on open data lake formats.
datazip-inc/olake
Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. โก Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB , MySQL and Oracle
lakekeeper/lakekeeper
Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.
paradedb/pg_analytics
DuckDB-powered data lake analytics from Postgres
buster-so/buster
Buster is an open-source platform for deploying AI data analysts
pracdata/awesome-open-source-data-engineering
A curated list of open source tools used in analytics platforms and data engineering ecosystem
cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
databricks/terraform-databricks-examples
Examples of using Terraform to deploy Databricks resources
adidas/lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
data-dot-all/dataall
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
qinsql/QinSQL
AI ๆถไปฃ็ๆบ่ฝๆฐๆฎๅบ
laminlabs/lamindb
A data framework for biology.
icelake-io/icelake
Pure Rust Iceberg Implementation
google/space
Unified storage framework for the entire machine learning lifecycle
mattiasthalen/adventure-works
Modern serverless lakehouse implementing HOOK methodology, Unified Star Schema (USS), and Analytical Data Storage System (ADSS) principles on Adventure Works. Features programmatic model generation, event-enhanced Puppini bridges, and temporal resolution across DAS/DAB/DAR layers.
bennyaustin/fabric-accelerator
Accelerator to build a Microsoft Fabric modern data platform using pre-built reusable Fabric items and an orchestration ELT Framework
lhbench/lhbench
Lakehouse storage system benchmark
dominikhei/Local-Data-LakeHouse
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
samber/awesome-olap
A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.
abeltavares/real-time-data-pipeline
๐ก Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
ysfesr/Building-Data-LakeHouse
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
databrickslabs/delta-oms
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-rc1 release.Documentation here - https://databrickslabs.github.io/delta-oms/v0.7.0-rc1/
harrydevforlife/building-lakehouse
Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.
apache/doris-streamloader
Stream Loader for Apache Doris
leehuwuj/olh
Open source stack lakehouse
manuzhang/awesome-lakehouse
a curated list of awesome lakehouse frameworks, applications, etc