deltalake
There are 56 repositories under deltalake topic.
paradedb/pg_analytics
DuckDB-powered analytics for Postgres
databrickslabs/dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
delta-io/kafka-delta-ingest
A highly efficient daemon for streaming data from Kafka into Delta Lake
MrPowers/mack
Delta Lake helper methods in PySpark
japila-books/delta-lake-internals
The Internals of Delta Lake
smart-data-lake/smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
uname-n/deltabase
a lightweight, comprehensive solution for managing delta tables built on polars and deltalake
izhangzhihao/Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
WeBankFinTech/Streamis
Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.
martandsingh/ApacheSpark
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.
anneglienke/101_upsert-delta
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
dacort/faker-cli
Command-line interface to quickly generate fake CSV and JSON data
bhavink/databricks
Databricks Platform - Architecture, Security, Automation and much more!!
sankamuk/PysparkCheatsheet
PySpark Cheatsheet
DataTech-Solutions/Threat-Detection-and-Visualization
Threat Detection and Visualization
leehuwuj/olh
Open source stack lakehouse
newfront/hitchhikers_guide_to_deltalake_streaming
Don't Panic. This guide will help you when it feels like the end of the world.
buoyant-data/oxbow
Collection of AWS Lambdas for creating and managing Delta tables
ognis1205/delta-hub
A platform and cloud-based service for data sharing based on the Delta Sharing protocol.
aws-samples/amazon-emr-with-delta-lake
Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR
ismailhammounou/db2ixf
db2ixf is a python package with a CLI that simplifies the parsing and processing of IBM Integration eXchange Format (IXF) files.
reisdebora/awesome-databricks
A curated list of awesome Databricks resources, including Spark
taka-yayoi/public_repo
Databricksのサンプルノートブックを格納しています。
aravinthsci/Spark_Delta_Lake
Delta Lake Examples
goodwillpunning/nodejs-sharing-client
A Node.js connector for Delta Sharing.
mrjsj/msfabricutils
Spark-free Python utilities for Microsoft Fabric focused on Data Engineering using Polars and delta-rs
bmsuisse/lakeapi
API for distributing Data Lake Data
gerardwolf/blog
Repository for all blog scripts and code
satyakommula96/spark_benchmark
Spark Performance Benchmark suite to evaluate all TPC-DS and TPC-H query times
xbrianh/xdlake
A loose implementation of the deltalake protocol, written in Python on top of pyarrow, focused on extensibility, customizability, and distributed data.
yandex-cloud/yc-delta
Delta Lake для Yandex Data Processing
alberttwong/opendatalakehouse
The Data Lakehouse Readiness Score is a quantitative measure that assesses a database's vendor support of Apache Iceberg, Apache Hudi and Delta Lake.
cmackenzie1/deltalake-examples-rs
Examples of working with the DeltaLake in Rust!
LeoneGarage/StreamJoin
A framework for incremental streaming joins and incremental streaming aggregations over change data feeds from Databricks Delta
cmackenzie1/deltalake-go
An implementation of Delta Lake in Go