apache-iceberg

There are 86 repositories under apache-iceberg topic.

risingwavelabs/risingwave
Streaming data platform. Real-time stream processing, low-latency serving, and Iceberg table management.
Language:Rust8.5k 78 7.5k704
matanolabs/matano
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
Language:Rust1.6k 21 105116
datazip-inc/olake
Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB , MySQL and Oracle
Language:Go1.2k 8 192138
apache/incubator-xtable
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Language:Java1.1k 27 291188
tansu-io/tansu
Apache Kafka® compatible broker with S3, PostgreSQL, SQLite, Apache Iceberg and Delta Lake
Language:Rust537 5 12621
nimtable/nimtable
The observability platform for Iceberg lakehouses.
Language:TypeScript375 8 5122
cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
Language:JavaScript288 11 2928
hyparam/icebird
Icebird: JavaScript Iceberg Client
Language:JavaScript1071
nimtable/iceberg-compaction
Compaction runtime for Apache Iceberg.
Language:Rust10612
lhbench/lhbench
Lakehouse storage system benchmark
Language:Scala77 2 213
dominikhei/Local-Data-LakeHouse
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
Language:Dockerfile74 3 216
abeltavares/real-time-data-pipeline
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
Language:Python52 1 07
dacort/modern-data-lake-storage-layers
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
Language:Jupyter Notebook47 2 029
aws-samples/transactional-datalake-using-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
Language:Python33 3 12
aws-solutions-library-samples/guidance-for-developing-data-and-ai-foundation-with-amazon-sagemaker
DAIVI is a reference solution with IAC modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS using the next generation Amazon SageMaker Unified Studio. The goal of the DAIVI solution is to provide engineers with sample infrastructure-as-code modules and application modules to build their data platforms.
Language:HCL339
guidok91/spark-movies-etl
Spark data pipeline that processes movie ratings data.
Language:Python30 0 112
aws-samples/monitoring-apache-iceberg-table-metadata-layer
Sample code to collect Apache Iceberg metrics for table monitoring
Language:Python29 4 05
bodo-ai/denali
An open-source, community-driven REST catalog for Apache Iceberg!
Language:Go29 3 153
aws-samples/iceberg-streaming-examples
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
Language:Java26 2 57
aws-samples/aws-glue-streaming-etl-with-apache-iceberg
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
Language:Python25 2 02
tj---/iceberg-demo
A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino
Language:Java21 1 05
gordonmurray/apache_flink_and_iceberg
A complete data engineering project demonstrating modern data stack practices with Apache Flink, Iceberg, Trino and Superset
Language:Python18 2 04
YeonwooSung/MLOps
Miscellaneous codes and writings for MLOps
Language:Jupyter Notebook15 5 52
aws-samples/aws-glue-streaming-ingestion-from-kafka-to-apache-iceberg
This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with AWS Glue Streaming.
Language:Python14 3 0
tlepple/iceberg-intro-workshop
Hands-on workshop with Apache Iceberg
Language:Shell14 1 41
aws-samples/transactional-datalake-using-amazon-datafirehose-iceberg
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with Amazon Data Firehose and DMS
Language:Python13 2 02
BauplanLabs/wap-with-bauplan-and-dbos
Write-Audit-Publish on the lakehouse in pure Python with bauplan and DBOS
Language:Python13 2 00
guidok91/spark-structured-streaming-kafka
Spark Structured Streaming data pipeline that processes movie ratings data in real-time.
Language:Python13 0 15
tlepple/data_origination_workshop
Hands-on workshop with Iceberg, Redpanda, Debezium and Kafka-Connect
Language:Shell13 1 02
fraibacas/lakehouse-poc
Run an open-source data LakeHouse locally using Docker Compose
Language:Python11 2 00
j3-signalroom/apache_flink-kickstarter
Examples of Apache Flink® v2.1 applications showcasing the DataStream API, Table API in Java and Python, and Flink SQL, featuring AWS, GitHub, Terraform, Streamlit, and Apache Iceberg.
Language:Java9 1 2781
BauplanLabs/data-agents-on-the-lakehouse
Playground for running agentic workflows over a programmable warehouse
Language:Python83
davidvanegas2/iceberg-s3-terraform-glue
Automated setup of Apache Iceberg on Amazon S3 using Terraform and AWS Glue Data Catalog. Explore the power of a Lakehouse architecture for data management and analysis, featuring schema discovery, metadata management, and efficient querying with Amazon Athena.
Language:HCL8 2 01
aws-samples/transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK Serverless and MSK Connect (Debezium)
Language:Python7 2 01
JesuFemi-O/iceberg-integration-framework
A poc open framework to manage data ingestion into apache iceberg tables
Language:Python7 1 00
abeltavares/versioned-data-lakehouse
🌊 Git-like Version Control for Data with Nessie, Iceberg, and Spark
Language:Jupyter Notebook5 1 02

apache-iceberg

risingwavelabs/risingwave

matanolabs/matano

datazip-inc/olake

apache/incubator-xtable

tansu-io/tansu

nimtable/nimtable

cuebook/cuelake

hyparam/icebird

nimtable/iceberg-compaction

lhbench/lhbench

dominikhei/Local-Data-LakeHouse

abeltavares/real-time-data-pipeline

dacort/modern-data-lake-storage-layers

aws-samples/transactional-datalake-using-apache-iceberg-on-aws-glue

aws-solutions-library-samples/guidance-for-developing-data-and-ai-foundation-with-amazon-sagemaker

guidok91/spark-movies-etl

aws-samples/monitoring-apache-iceberg-table-metadata-layer

bodo-ai/denali

aws-samples/iceberg-streaming-examples

aws-samples/aws-glue-streaming-etl-with-apache-iceberg

tj---/iceberg-demo

gordonmurray/apache_flink_and_iceberg

YeonwooSung/MLOps

aws-samples/aws-glue-streaming-ingestion-from-kafka-to-apache-iceberg

tlepple/iceberg-intro-workshop

aws-samples/transactional-datalake-using-amazon-datafirehose-iceberg

BauplanLabs/wap-with-bauplan-and-dbos

guidok91/spark-structured-streaming-kafka

tlepple/data_origination_workshop

fraibacas/lakehouse-poc

j3-signalroom/apache_flink-kickstarter

BauplanLabs/data-agents-on-the-lakehouse

davidvanegas2/iceberg-s3-terraform-glue

aws-samples/transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue

JesuFemi-O/iceberg-integration-framework

abeltavares/versioned-data-lakehouse