lakehouse

There are 115 repositories under lakehouse topic.

  • ClickHouse

    ClickHouse/ClickHouse

    ClickHouseยฎ is a real-time analytics database management system

    Language:C++42.9k68725.3k7.7k
  • presto

    prestodb/presto

    The official home of the Presto distributed SQL query engine for big data

    Language:Java16.5k8437k5.5k
  • apache/doris

    Apache Doris is an easy-to-use, high performance and unified analytics database.

    Language:Java14.3k2858k3.6k
  • StarRocks/starrocks

    The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.

    Language:Java10.7k1878.9k2.1k
  • databend

    databendlabs/databend

    ๐—”๐—œ-๐—ก๐—ฎ๐˜๐—ถ๐˜ƒ๐—ฒ ๐——๐—ฎ๐˜๐—ฎ ๐—ช๐—ฎ๐—ฟ๐—ฒ๐—ต๐—ผ๐˜‚๐˜€๐—ฒ. Open-source Snowflake alternative. Proven at petabyte scale with enterprise performance. Built for multimodal analytics. https://databend.com

    Language:Rust8.8k935.9k813
  • lakesoul-io/LakeSoul

    LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

    Language:Java3k296120409
  • ByConity/ByConity

    ByConity is an open source cloud data warehouse

    Language:C++2.2k49679318
  • ytsaurus/ytsaurus

    YTsaurus is a scalable and fault-tolerant open-source big data platform.

    Language:C++2.1k43468172
  • apache/gravitino

    World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.

    Language:Java1.8k384k601
  • pg_mooncake

    Mooncake-Labs/pg_mooncake

    Real-time analytics on Postgres tables

    Language:Rust1.7k157646
  • apache/amoro

    Apache Amoro(incubating) is a Lakehouse management system built on open data lake formats.

    Language:Java1k381.6k355
  • datazip-inc/olake

    Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. โšก Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB , MySQL and Oracle

    Language:Go1k898110
  • lakekeeper/lakekeeper

    Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.

    Language:Rust887621084
  • paradedb/pg_analytics

    DuckDB-powered data lake analytics from Postgres

    Language:Rust5225026
  • buster-so/buster

    Buster is an open-source platform for deploying AI data analysts

    Language:TypeScript44510333
  • pracdata/awesome-open-source-data-engineering

    A curated list of open source tools used in analytics platforms and data engineering ecosystem

  • cuebook/cuelake

    Use SQL to build ELT pipelines on a data lakehouse.

    Language:JavaScript288112928
  • databricks/terraform-databricks-examples

    Examples of using Terraform to deploy Databricks resources

    Language:HCL2827668149
  • adidas/lakehouse-engine

    The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

    Language:Python26618343
  • dataall

    data-dot-all/dataall

    A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

    Language:Python2461174084
  • qinsql/QinSQL

    AI ๆ—ถไปฃ็š„ๆ™บ่ƒฝๆ•ฐๆฎๅบ“

    Language:Java22418952
  • laminlabs/lamindb

    A data framework for biology.

    Language:Python184249715
  • icelake-io/icelake

    Pure Rust Iceberg Implementation

    Language:Rust162127720
  • google/space

    Unified storage framework for the entire machine learning lifecycle

    Language:Python155738
  • mattiasthalen/adventure-works

    Modern serverless lakehouse implementing HOOK methodology, Unified Star Schema (USS), and Analytical Data Storage System (ADSS) principles on Adventure Works. Features programmatic model generation, event-enhanced Puppini bridges, and temporal resolution across DAS/DAB/DAR layers.

    Language:Python1151910
  • bennyaustin/fabric-accelerator

    Accelerator to build a Microsoft Fabric modern data platform using pre-built reusable Fabric items and an orchestration ELT Framework

    Language:TSQL8695118
  • lhbench/lhbench

    Lakehouse storage system benchmark

    Language:Scala76229
  • dominikhei/Local-Data-LakeHouse

    Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.

    Language:Dockerfile744213
  • samber/awesome-olap

    A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.

  • abeltavares/real-time-data-pipeline

    ๐Ÿ“ก Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.

    Language:Python49107
  • ysfesr/Building-Data-LakeHouse

    Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data

    Language:Python47119
  • databrickslabs/delta-oms

    DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-rc1 release.Documentation here - https://databrickslabs.github.io/delta-oms/v0.7.0-rc1/

    Language:Scala39894
  • harrydevforlife/building-lakehouse

    Building Data Lakehouse by open source technology. Support end to end data pipeline, from source data on AWS S3 to Lakehouse, visualize and recommend app.

    Language:Python32103
  • apache/doris-streamloader

    Stream Loader for Apache Doris

    Language:Go2939518
  • leehuwuj/olh

    Open source stack lakehouse

    Language:Python25101
  • manuzhang/awesome-lakehouse

    a curated list of awesome lakehouse frameworks, applications, etc