data-pipelines

There are 321 repositories under data-pipelines topic.

pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Language:Python49.3k 93 1081.4k
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Language:Python43.1k 758 13.1k15.9k
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
Language:Python14.4k 120 8.2k1.9k
apache/dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Language:Java13.9k 325 8.1k4.9k
Unstructured-IO/unstructured
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
Language:HTML13.1k 68 1.2k1.1k
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Language:Python8.5k 63 1k888
infinyon/fluvio
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
Language:Rust5.1k 45 1.6k519
StructuredLabs/preswald
Preswald is a WASM packager for Python-based interactive data apps: bundle full complex data workflows, particularly visualizations, into single files, runnable completely in-browser, using Pyodide, DuckDB, Pandas, and Plotly, Matplotlib, etc. Build dashboards, reports, and notebooks that run offline, load fast, and share like a document.
Language:Python4.3k 6 138662
orchest/orchest
Build data pipelines, the easy way 🛠️
Language:TypeScript4.1k 40 481264
Netflix/maestro
Maestro: Netflix’s Workflow Orchestrator
Language:Java3.6k 169 81246
ucbepic/docetl
A system for agentic LLM-powered data processing and ETL
Language:Python3k 26 134317
meltano/meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Language:Python2.2k 8 6.8k186
elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Language:HTML2.2k 11 657202
data-engineering-community/data-engineering-wiki
The best place to learn data engineering. Built and maintained by the data engineering community.
Language:CSS1.8k 27 67222
feldera/feldera
The Feldera Incremental Computation Engine
Language:Rust1.7k 11 1.5k81
yobix-ai/extractous
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Language:Rust1.6k 17 4777
combust/mleap
MLeap: Deploy ML Pipelines to Production
Language:Scala1.5k 65 476314
pyper-dev/pyper
Concurrent Python made simple
Language:Python1.5k 5 729
OpenDCAI/DataFlow
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
Language:Python1.5k 19 43101
fmind/mlops-python-package
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Language:Jupyter Notebook1.4k 15 26197
opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Language:Java1.4k 18 645129
bruin-data/bruin
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Language:Go1.2k 8 1052
amphi-ai/amphi-etl
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
Language:TypeScript1.2k 15 24581
dataform-co/dataform
Dataform is a framework for managing SQL based data operations in BigQuery
Language:TypeScript936 22 564188
raystack/optimus
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Language:Go754 15 268154
artie-labs/transfer
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
Language:Go680 10 4542
vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL.
Language:Python469 13 94862
elementary-data/dbt-data-reliability
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Language:Python468 5 34114
dataplane-app/dataplane
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
Language:JavaScript356 6 3945
gabledata/recap
Work with your web service, database, and streaming schemas in a single format.
Language:Python343 10 13326
dataflint/spark
Drop-in replacement for Apache Spark UI
Language:TypeScript340 5 1541
tuva-health/tuva
Main repo including core data model, data marts, data quality tests, and terminology sets.
Language:HTML276 12 36599
terrytangyuan/awesome-kubeflow
A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)
217 6 019
kevin-hanselman/dud
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
Language:Go214 7 729
Burla-Cloud/burla
Easy to use cluster-compute software.
Language:TypeScript196 0 24
datajoint/datajoint-python
Relational data pipelines for the science lab
Language:Python184 14 64890

data-pipelines

pathwaycom/pathway

apache/airflow

dagster-io/dagster

apache/dolphinscheduler

Unstructured-IO/unstructured

mage-ai/mage-ai

infinyon/fluvio

StructuredLabs/preswald

orchest/orchest

Netflix/maestro

ucbepic/docetl

meltano/meltano

elementary-data/elementary

data-engineering-community/data-engineering-wiki

feldera/feldera

yobix-ai/extractous

combust/mleap

pyper-dev/pyper

OpenDCAI/DataFlow

fmind/mlops-python-package

opendatadiscovery/odd-platform

bruin-data/bruin

amphi-ai/amphi-etl

dataform-co/dataform

raystack/optimus

artie-labs/transfer

vmware/versatile-data-kit

elementary-data/dbt-data-reliability

dataplane-app/dataplane

gabledata/recap

dataflint/spark

tuva-health/tuva

terrytangyuan/awesome-kubeflow

kevin-hanselman/dud

Burla-Cloud/burla

datajoint/datajoint-python