datapipeline
There are 194 repositories under datapipeline topic.
zhaoyachao/zdh_web
大数据采集,抽取平台,zdh_web是zdh系列服务的可视化管理平台,包含数据采集,调度,权限,审批流,私域营销等模块
ErdemOzgen/Data-Engineering-Roadmap
Roadmap for Data Engineering
josephmachado/beginner_de_project_stream
Simple stream processing pipeline
ContextData/VectorETL
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
kartik4949/TensorPipe
High Performance Tensorflow Data Pipeline with State of Art Augmentations and low level optimizations.
Alireza-Akhavan/tf2-tutorial
Tensorflow 2 Tutorials (use tensorflow and keras in a better way!)
cloudposse/terraform-aws-efs-backup
Terraform module designed to easily backup EFS filesystems to S3 using DataPipeline
josephmachado/de_project
Step by step instructions to create a production-ready data pipeline
KennethanCeyer/awesome-data-pipeline
Awesome list for datapipeline
covalenthq/bsp-geth
Ethereum client written in Go, modified for full-hierarchy data exports and block specimen production
indix/sparkplug
Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
WaylonWalker/kedro-static-viz
kedro cli plugin for generating a static kedro viz site (html, css, js) that can be deployed on many serverless tools.
kromozome2003/Snowflake-Json-DataPipeline
Building Json data pipeline within Snowflake using Streams and Tasks
shazam/scala-datapipeline-dsl
Domain-specific language to help build and maintain AWS Data Pipelines
behnamyazdan/PythonForDataEngineeringCourse
This course is designed to provide learners with the fundamental skills needed for data engineering using Python. The objective is to introduce anyone interested in the topic to Python's data engineering-related features.
WaylonWalker/kedro-action
A GitHub Action to lint, test, build-docs, package, and run your kedro pipelines. Supports any Python version you'll give it (that is also supported by pyenv).
NVIDIA/go-tfdata
Go library that provides easy-to-use interfaces and tools for TensorFlow users, in particular allowing to train existing TF models on .tar and .tgz datasets
DudeWhoCode/kulay
High speed message passing between various queues and services
adilkhash/luigi-course-materials
Материалы для курса Введение в Data Engineering: дата пайплайны
Mg30/pydwt
Modeling tool like DBT to use SQL Alchemy core with a DataFrame interface like
wri/gfw_forest_loss_geotrellis
Global Tree Cover Loss Analysis using Geotrellis and SPARK
mehroosali/databricks-F1-Project
A data pipeline project build on databricks and azure to demostrate lifecycle of a cloud data project.
multilayer-io/airflow-kubernetes
Simple Airflow on Kubernetes (GKE)
huanwuji/teleporter
Reactive Streams distributed datapipeline for data process. Now support kafka,jdbc,kudu,elasticsearch,hdfs.etc
teckkean/GTFS-Data-Pipeline-TfNSW-Bus
GTFS Data Pipeline for TfNSW Bus Datasets
Kushalkhadka7/dagster_clickhouse_dbt
DBT and clickhouse test project with dagster
RudraChatterjee/Machine-Failure_Prediction_EnsembleMethods_ModelTuning
This project predicts wind turbine failure using numerous sensor data by applying classification based ML models that improves prediction by tuning model hyperparameters and addressing class imbalance through over and under sampling data. Final model is productionized using a data pipeline
Yan-Luo-AU/Data_Engineer_Project_ETL_BI
This is an ETL project - extracting data from an ecommerce transactional database on RDS, transforming the data using AWS glue job, and loading it to a Redshift data warehouse, and connected it to Tableau for BI
ankitanshumanmohapatra/Azure-Olympics-Analysis-Data-Engineering-End-to-End-Project
This is a End-to-End Azure Data Engineering Project | Analysis on the entire ETL Pipeline - Azure Factory, Azure Lake Gen 2, Databricks, Azure Synapse Analytics & Dashboards
julian-King22/etl_with_mage_ai
An ETL data pipeline that extracts data from source and loads it to destination, automated using mage.ai
Sagar-Salvi/Data-Engineering-Project
The Centralized Data Warehouse and ML Solution for Banking Analytics is a project that combines a centralized repository for banking data with machine learning algorithms to enable predictive analysis.
gchatterjee-git/Data-Pipeline-AWS
This is a project which demonstrates creation of a data pipeline by scraping data using twitter API and creating a data delivery stream using Kinesis Firehose for ingesting data to Amazon S3.
vvspearlvvs/MusicChatbot
AWS 데이터파이프라인 개발과 음악추천 챗봇
wri/gfw_pixetl
GFW ETL for raster tiles