data-lake
There are 259 repositories under data-lake topic.
data_engineering_with_python-track-datacamp
Data Engineer with Python lecture notes from #datacamp.
nodestream
A Declarative framework for Building, Maintaining, and Analyzing Graph Data
cnfuzz
Breaking Cloud Native Web APIs in their natural habitat.
Awesome-Data-Engineering
📒(GitBook) A curated list of awesome Data Engineering resources
razv-data-engineering
Portfolio of projects and studies conducted in data engineering.
jobAnalytics_and_search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
docker_datalake
Datalake
terraform-module-azure-datalake
Terraform module for an Azure Data Lake
havasu
The spatial table format for spatial lakehouse
data-engineering-mta-turnstile
Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis
hiveberg
Demonstration of a Hive Input Format for Iceberg
data-mill
A K8s-based infrastructure for analytics
trino-hive-superset-docker
Cloud-native Trino (prestosql) + Hive + Minio + Superset
vulcan-sql-examples
Curated VulcanSQL show cases
Python-MySql-Operation
This Python MySQL Repo shows you how to use MySQL Connector Python to access MySQL databases. You will learn how to connect to MySQL database and perform common database operations such as SELECT, INSERT, UPDATE, & DELETE in Python.
linkml-store
wrapper for multiple linkml storage engines
EdgeLake
Data Lake on the Edge
defenda-data-lake
defendA Data Lake. A firehose pipeline to athena providing enrichment and normalization for security events
herd-mdl
Herd-MDL, a turnkey managed data lake in the cloud. See https://finraos.github.io/herd-mdl/ for more information.
swu-ds525
DS525
dataasee
DatAasee - A Metadata-Lake for Libraries
Data_Engineering_Projects
A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousing, containerization, and a dashboard to monitor data pipeline KPIs
eubfr-data-lake
EU Budget for Results - Data Lake
kyuubi-docker
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
lakeFS-hooks
a simple lakeFS webhook for pre-commit and pre-merge validation of data objects
aws-serverless-data-lake-workshop
This workshop is meant to give customers a hands-on experience with mentioned AWS services. Serverless Data Lake workshop helps customers build a cloud-native and future-proof serverless data lake architecture. It allows hands-on time with AWS big data and analytics services including Amazon Kinesis Services for streaming data ingestion
hana-cloud-relational-data-lake-onboarding
This is an end-to-end onboarding sample for SAP HANA Cloud, relational data lake. It shows how to create schema, load data, and execute queries.
healthcare_data_pipeline
An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.
lakeapi
API for distributing Data Lake Data
2020-HealthcareLake
A reasonably secure data lake for healthcare analytics
columnar
An idiomatic kotlin dataframe toolkit for data engineering tasks of any size dataset
aws-well-architected-framework
Prominent data platform design with AWS well-architected framework
Data-Lake-with-Spark-and-AWS-S3
Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster
adls-azure
Procedimento para criação de um Azure Data Lake Storage usando Terraform, através de uma assinatura MS Learn Sandbox
stream-etl-with-glue
Serverless streaming ETL in with glue job & querying with Athena
logstash-output-adls
Logstash output plugin for Azure Data Lake Store (ADLS)