data-pipelines
There are 297 repositories under data-pipelines topic.
public-datasets-pipelines
Cloud-native, data onboarding architecture for Google Cloud Datasets
palimpzest
A System for Optimized Semantic Computation
smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
didact
The open core .NET job orchestrator that we've been missing
Hoptimator
Multi-hop declarative data pipelines
burla
The simplest way to run Python on lot's of computers.
mycelial
Move your data with ease.
patterns-devkit
Data pipelines from re-usable components
udacity-data-eng-proj-1
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
datacater
The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.
python-sdk
Conductor OSS SDK for Python programming language
beneath
Beneath is a serverless real-time data platform ⚡️
exospherehost
Infra for scalable and reliable AI agents
didact-engine
The REST API and execution engine for the Didact Platform.
Udacity-Data-Engineer-nanodegree
Classwork projects and home works done through Udacity data engineering nano degree
data_engineer_interview_challenges
Found a data engineering challenge or participated in a selection process ? Share with us!
xvc
A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)
kenobi
Easiest way to monitor asynchronous data pipelines
ops0-cli
ops0 is an AI-powered natural language DevOps CLI native to Claude AI with ansible, terraform, kubernetes, aws, azure and docker operations in a single cli. An open-source alternative to complex DevOps workflows, manual operations, etc. 🤖 ⚡ 👉 Natural Language DevOps Automation & Troubleshooting Tool
uniflow
A high-performance, extremely flexible, and easily extensible universal workflow engine.
CogStack-NiFi
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Udacity-Data-Engineering-Nanodgree
Udacity Data Engineering Nanodegree Program
ml-in-production
The practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
streams-explorer
Explore Apache Kafka data pipelines in Kubernetes.
spark-transformers
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
learn-kafka-courses
Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.
kedro-pandera
A kedro plugin to use pandera in your kedro projects
AirflowDataPipeline
Example of an ETL Pipeline using Airflow
tabsdata
A Pub/Sub for Tables based data integration platform, to discover, publish, modify and consume data effortlessly.
dagster-odp
A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code
examples
Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services
dbt-command-center
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.
debezium-platform
An opinionated data-centric view of Debezium components
arakat
ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
stepist
Framework for data processing
demo
A starter dbt project and synthetic claims dataset for trying out the Tuva Project.