lakehouse
There are 75 repositories under lakehouse topic.
prestodb/presto
The official home of the Presto distributed SQL query engine for big data
apache/doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
StarRocks/starrocks
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
lakesoul-io/LakeSoul
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
ByConity/ByConity
ByConity is an open source cloud data warehouse
ytsaurus/ytsaurus
YTsaurus is a scalable and fault-tolerant open-source big data platform.
apache/amoro
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
datastrato/gravitino
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
qinsql/QinSQL
AI 时代的智能数据库
data-dot-all/dataall
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
adidas/lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
databricks/terraform-databricks-examples
Examples of using Terraform to deploy Databricks resources
icelake-io/icelake
Pure Rust Iceberg Implementation
google/space
Unified storage framework for the entire machine learning lifecycle
pracdata/awesome-open-source-data-engineering
A curated list of open source tools used in analytical stacks and data engineering ecosystem
lhbench/lhbench
Lakehouse storage system benchmark
dominikhei/Local-Data-LakeHouse
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
databrickslabs/delta-oms
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-rc1 release.Documentation here - https://databrickslabs.github.io/delta-oms/v0.7.0-rc1/
ysfesr/Building-Data-LakeHouse
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
leehuwuj/olh
Open source stack lakehouse
samber/awesome-olap
A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.
paradedb/pg_analytics
Analytical table access method for Postgres
vvalcristina/Workshop-Data-Lakehouse
Repositório dedicado a Workshop de Data Lakehouse com Delta Lake
databricks-industry-solutions/omop-cdm
Unlocking the Power of Health Data With a Modern Data Lakehouse
microsoft/Fabric-RTA-FlightStream
Microsoft Fabric Real-time Analytics flight streaming
prestodb/prestorials
Tutorials and examples of how to deploy Presto and connect it to different data sources
tspannhw/FLiPStackWeekly
FLaNK AI Weekly covering Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Apache Ozone, Apache Pulsar, and more...
apache/doris-streamloader
Stream Loader for Apache Doris
ekote/Build-Your-First-End-to-End-Lakehouse-Solution
Build Your First End-to-End Lakehouse Solution (aka.ms/fabconlake)
manuzhang/awesome-lakehouse
a curated list of awesome lakehouse frameworks, applications, etc
victorskl/genomic-bigdata-spark
Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture
databrickslabs/waterbear
Automated provisioning of an industry Lakehouse with enterprise data model
AnthonyByansi/MicrosoftFabric-Exploratorium
A comprehensive educational resource hub dedicated to mastering Microsoft Fabric, offering in-depth tutorials, real-world use cases, and hands-on guides for seamless end-to-end analytics
databricks-industry-solutions/dns-analytics
Leverage the Databricks Solution Accelerator for DNS analytics to accelerate time to detection and response across petabytes of data. Tap into DNS traffic logs, enrich streaming threat intelligence, and apply advanced analytics to detect DNS abnormalities and prevent malicious attacks.
paradedb/helm-charts
Helm chart for deploying ParadeDB on Kubernetes