apache-iceberg
There are 86 repositories under apache-iceberg topic.
risingwavelabs/risingwave
Streaming data platform. Real-time stream processing, low-latency serving, and Iceberg table management.
matanolabs/matano
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
datazip-inc/olake
Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB , MySQL and Oracle
apache/incubator-xtable
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
tansu-io/tansu
Apache Kafka® compatible broker with S3, PostgreSQL, SQLite, Apache Iceberg and Delta Lake
nimtable/nimtable
The observability platform for Iceberg lakehouses.
cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
hyparam/icebird
Icebird: JavaScript Iceberg Client
nimtable/iceberg-compaction
Compaction runtime for Apache Iceberg.
lhbench/lhbench
Lakehouse storage system benchmark
dominikhei/Local-Data-LakeHouse
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
abeltavares/real-time-data-pipeline
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
dacort/modern-data-lake-storage-layers
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
aws-samples/transactional-datalake-using-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
aws-solutions-library-samples/guidance-for-developing-data-and-ai-foundation-with-amazon-sagemaker
DAIVI is a reference solution with IAC modules to accelerate development of Data, Analytics, AI and Visualization applications on AWS using the next generation Amazon SageMaker Unified Studio. The goal of the DAIVI solution is to provide engineers with sample infrastructure-as-code modules and application modules to build their data platforms.
guidok91/spark-movies-etl
Spark data pipeline that processes movie ratings data.
aws-samples/monitoring-apache-iceberg-table-metadata-layer
Sample code to collect Apache Iceberg metrics for table monitoring
bodo-ai/denali
An open-source, community-driven REST catalog for Apache Iceberg!
aws-samples/iceberg-streaming-examples
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
aws-samples/aws-glue-streaming-etl-with-apache-iceberg
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
tj---/iceberg-demo
A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino
gordonmurray/apache_flink_and_iceberg
A complete data engineering project demonstrating modern data stack practices with Apache Flink, Iceberg, Trino and Superset
YeonwooSung/MLOps
Miscellaneous codes and writings for MLOps
aws-samples/aws-glue-streaming-ingestion-from-kafka-to-apache-iceberg
This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with AWS Glue Streaming.
tlepple/iceberg-intro-workshop
Hands-on workshop with Apache Iceberg
aws-samples/transactional-datalake-using-amazon-datafirehose-iceberg
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with Amazon Data Firehose and DMS
BauplanLabs/wap-with-bauplan-and-dbos
Write-Audit-Publish on the lakehouse in pure Python with bauplan and DBOS
guidok91/spark-structured-streaming-kafka
Spark Structured Streaming data pipeline that processes movie ratings data in real-time.
tlepple/data_origination_workshop
Hands-on workshop with Iceberg, Redpanda, Debezium and Kafka-Connect
fraibacas/lakehouse-poc
Run an open-source data LakeHouse locally using Docker Compose
j3-signalroom/apache_flink-kickstarter
Examples of Apache Flink® v2.1 applications showcasing the DataStream API, Table API in Java and Python, and Flink SQL, featuring AWS, GitHub, Terraform, Streamlit, and Apache Iceberg.
BauplanLabs/data-agents-on-the-lakehouse
Playground for running agentic workflows over a programmable warehouse
davidvanegas2/iceberg-s3-terraform-glue
Automated setup of Apache Iceberg on Amazon S3 using Terraform and AWS Glue Data Catalog. Explore the power of a Lakehouse architecture for data management and analysis, featuring schema discovery, metadata management, and efficient querying with Amazon Athena.
aws-samples/transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK Serverless and MSK Connect (Debezium)
JesuFemi-O/iceberg-integration-framework
A poc open framework to manage data ingestion into apache iceberg tables
abeltavares/versioned-data-lakehouse
🌊 Git-like Version Control for Data with Nessie, Iceberg, and Spark