apache-iceberg
There are 65 repositories under apache-iceberg topic.
matanolabs/matano
Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
apache/incubator-xtable
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
datazip-inc/olake
Fastest open-source tool for replicating Databases to Data Lake in Open Table Formats like Apache Iceberg. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB , MySQL and Oracle
buster-so/buster
Buster is an open-source platform for deploying AI data analysts
cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
lhbench/lhbench
Lakehouse storage system benchmark
dominikhei/Local-Data-LakeHouse
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
abeltavares/real-time-data-pipeline
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
dacort/modern-data-lake-storage-layers
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
aws-samples/transactional-datalake-using-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
aws-samples/iceberg-streaming-examples
This repo contains examples of high throughput ingestion using Apache Spark and Apache Iceberg. These examples cover IoT and CDC scenarios using best practices. The code can be deployed into any Spark compatible engine like Amazon EMR Serverless or AWS Glue. A fully local developer environment is also provided.
aws-samples/monitoring-apache-iceberg-table-metadata-layer
Sample code to collect Apache Iceberg metrics for table monitoring
bodo-ai/denali
An open-source, community-driven REST catalog for Apache Iceberg!
aws-samples/aws-glue-streaming-etl-with-apache-iceberg
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
tj---/iceberg-demo
A sample implementation of stream writes to an Iceberg table on GCS using Flink and reading it using Trino
YeonwooSung/MLOps
Miscellaneous codes and writings for MLOps
tlepple/iceberg-intro-workshop
Hands-on workshop with Apache Iceberg
BauplanLabs/wap-with-bauplan-and-dbos
Write-Audit-Publish on the lakehouse in pure Python with bauplan and DBOS
tlepple/data_origination_workshop
Hands-on workshop with Iceberg, Redpanda, Debezium and Kafka-Connect
aws-samples/transactional-datalake-using-amazon-datafirehose-iceberg
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with Amazon Data Firehose and DMS
aws-samples/aws-glue-streaming-ingestion-from-kafka-to-apache-iceberg
This is a collecton of Amazon CDK projects to show how to directly ingest streaming data from Amazon Mananged Service for Apache Kafka (MSK) and MSK Serverless into Apache Iceberg table in S3 with AWS Glue Streaming.
fraibacas/lakehouse-poc
Run an open-source data LakeHouse locally using Docker Compose
aws-samples/transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK Serverless and MSK Connect (Debezium)
davidvanegas2/iceberg-s3-terraform-glue
Automated setup of Apache Iceberg on Amazon S3 using Terraform and AWS Glue Data Catalog. Explore the power of a Lakehouse architecture for data management and analysis, featuring schema discovery, metadata management, and efficient querying with Amazon Athena.
gordonmurray/apache_flink_and_iceberg
Using Apache Flink to write to s3 in Apache Iceberg format
j3-signalroom/apache_flink-kickstarter
Examples of Apache Flink® applications showcasing the DataStream API, Table API in Java and Python, and Flink SQL, featuring AWS, GitHub, Terraform, Streamlit, and Apache Iceberg.
hyparam/icebird
Icebird: JavaScript Iceberg Client
JesuFemi-O/iceberg-integration-framework
A poc open framework to manage data ingestion into apache iceberg tables
abeltavares/versioned-data-lakehouse
🌊 Git-like Version Control for Data with Nessie, Iceberg, and Spark
aws-samples/transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK and MSK Connect (Debezium)
BauplanLabs/playlist-recomendations-with-bauplan-and-mongodb
Reference implementation of embedding-based, sequential recommendations, using Bauplan (with Apache Iceberg + Apache Arrow) for data preparation and training, and MongoDB for serving real-time suggestions.
ev2900/Iceberg_EMR_Athena
Resources from an virtual tech talk / workshop - Set Up and Use Apache Iceberg Tables on Your Data Lake
ev2900/Iceberg_update_metadata_script
Python script that will update S3 file paths in Iceberg metadata files (metadata.json + AVRO)
MOBIN-F/iceberg-spark-tpcds-benchmark
iceberg-spark-tpcds-benchmark
joewood/react-iceberg
React Components to visualize Apache Iceberg tables