data-lake

There are 259 repositories under data-lake topic.

  • data_engineering_with_python-track-datacamp

    Data Engineer with Python lecture notes from #datacamp.

    Language:Jupyter Notebook43
  • nodestream

    A Declarative framework for Building, Maintaining, and Analyzing Graph Data

    Language:Python40
  • cnfuzz

    cnfuzz

    Breaking Cloud Native Web APIs in their natural habitat.

    Language:Go35
  • Awesome-Data-Engineering

    📒(GitBook) A curated list of awesome Data Engineering resources

  • razv-data-engineering

    Portfolio of projects and studies conducted in data engineering.

    Language:Jupyter Notebook33
  • jobAnalytics_and_search

    JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

    Language:Python31
  • docker_datalake

    Datalake

    Language:JavaScript30
  • terraform-module-azure-datalake

    Terraform module for an Azure Data Lake

    Language:HCL30
  • havasu

    The spatial table format for spatial lakehouse

  • data-engineering-mta-turnstile

    Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis

    Language:Jupyter Notebook26
  • hiveberg

    Demonstration of a Hive Input Format for Iceberg

    Language:Java26
  • data-mill

    A K8s-based infrastructure for analytics

    Language:Shell24
  • trino-hive-superset-docker

    Cloud-native Trino (prestosql) + Hive + Minio + Superset

    Language:Dockerfile21
  • vulcan-sql-examples

    Curated VulcanSQL show cases

    Language:Jupyter Notebook20
  • Python-MySql-Operation

    This Python MySQL Repo shows you how to use MySQL Connector Python to access MySQL databases. You will learn how to connect to MySQL database and perform common database operations such as SELECT, INSERT, UPDATE, & DELETE in Python.

    Language:Jupyter Notebook18
  • linkml-store

    wrapper for multiple linkml storage engines

    Language:Python17
  • EdgeLake

    Data Lake on the Edge

    Language:Python17
  • defenda-data-lake

    defendA Data Lake. A firehose pipeline to athena providing enrichment and normalization for security events

    Language:Python16
  • herd-mdl

    Herd-MDL, a turnkey managed data lake in the cloud. See https://finraos.github.io/herd-mdl/ for more information.

    Language:Java16
  • swu-ds525

    DS525

    Language:Jupyter Notebook15
  • dataasee

    dataasee

    DatAasee - A Metadata-Lake for Libraries

    Language:Makefile14
  • Data_Engineering_Projects

    A collection of data engineering projects: data modeling, ETL pipelines, data lakes, infrastructure configuration on AWS, data warehousing, containerization, and a dashboard to monitor data pipeline KPIs

    Language:Python14
  • eubfr-data-lake

    EU Budget for Results - Data Lake

    Language:JavaScript14
  • kyuubi-docker

    Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

    Language:Dockerfile13
  • lakeFS-hooks

    a simple lakeFS webhook for pre-commit and pre-merge validation of data objects

    Language:Python12
  • aws-serverless-data-lake-workshop

    This workshop is meant to give customers a hands-on experience with mentioned AWS services. Serverless Data Lake workshop helps customers build a cloud-native and future-proof serverless data lake architecture. It allows hands-on time with AWS big data and analytics services including Amazon Kinesis Services for streaming data ingestion

    Language:Jupyter Notebook12
  • hana-cloud-relational-data-lake-onboarding

    This is an end-to-end onboarding sample for SAP HANA Cloud, relational data lake. It shows how to create schema, load data, and execute queries.

  • healthcare_data_pipeline

    An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.

    Language:Python10
  • lakeapi

    API for distributing Data Lake Data

    Language:Python9
  • 2020-HealthcareLake

    A reasonably secure data lake for healthcare analytics

    Language:HCL9
  • columnar

    An idiomatic kotlin dataframe toolkit for data engineering tasks of any size dataset

    Language:HTML9
  • aws-well-architected-framework

    Prominent data platform design with AWS well-architected framework

    Language:Python9
  • Data-Lake-with-Spark-and-AWS-S3

    Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster

    Language:Python9
  • adls-azure

    Procedimento para criação de um Azure Data Lake Storage usando Terraform, através de uma assinatura MS Learn Sandbox

    Language:HCL7
  • stream-etl-with-glue

    Serverless streaming ETL in with glue job & querying with Athena

    Language:Python7
  • logstash-output-adls

    Logstash output plugin for Azure Data Lake Store (ADLS)

    Language:Ruby7