data-pipelines
There are 195 repositories under data-pipelines topic.
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
apache/dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
mage-ai/mage-ai
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
orchest/orchest
Build data pipelines, the easy way 🛠️
infinyon/fluvio
Lean and mean distributed stream processing system written in rust and web assembly.
elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
meltano/meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
combust/mleap
MLeap: Deploy ML Pipelines to Production
opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
data-engineering-community/data-engineering-wiki
The best place to learn data engineering. Built and maintained by the data engineering community.
dataform-co/dataform
Dataform is a framework for managing SQL based data operations in BigQuery
raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
artie-labs/transfer
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
elementary-data/dbt-data-reliability
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
recap-build/recap
Work with your web service, database, and streaming schemas in a single format.
dataplane-app/dataplane
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
terrytangyuan/awesome-kubeflow
A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)
realize-engineering/pipebird
Pipebird is open source infrastructure for securely sharing data with customers.
kevin-hanselman/dud
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
datajoint/datajoint-python
Relational data pipelines for the science lab
koolreport/core
An Open Source PHP Reporting Framework that helps you to write perfect data reports or to construct awesome dashboards in PHP. Working great with all PHP versions from 5.6 to latest 8.0. Fully compatible with all kinds of MVC frameworks like Laravel, CodeIgniter, Symfony.
tuva-health/tuva
This is the main repo and includes all data marts, terminology sets, and reference datasets in the Tuva Project.
GoogleCloudPlatform/public-datasets-pipelines
Cloud-native, data onboarding architecture for Google Cloud Datasets
dataflint/spark
Performance Observability for Apache Spark
patterns-app/patterns-devkit
Data pipelines from re-usable components
smart-data-lake/smart-data-lake
Smart Automation Tool for building modern Data Lakes and Data Pipelines
shravan-kuchkula/udacity-data-eng-proj-1
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
DataCater/datacater
The developer-friendly ETL platform for transforming data in real-time. Based on Apache Kafka® and Kubernetes®.
beneath-hq/beneath
Beneath is a serverless real-time data platform ⚡️
confluentinc/learn-kafka-courses
Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.
linkedin/Hoptimator
Multi-hop declarative data pipelines
immu0001/Udacity-Data-Engineer-nanodegree
Classwork projects and home works done through Udacity data engineering nano degree
minhadona/data_engineer_interview_challenges
Found a data engineering challenge or participated in a selection process ? Share with us!