etl
There are 5229 repositories under etl topic.
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
redpanda-data/connect
Fancy stream processing made operationally mundane
turbot/steampipe
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
apache/flink-cdc
Flink CDC is a streaming data integration tool
cloudquery/cloudquery
The open source ELT framework powered by Apache Arrow
rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React
orchest/orchest
Build data pipelines, the easy way 🛠️
nucleuscloud/neosync
Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.
aws/aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
quadratichq/quadratic
Spreadsheet with AI, Code, Connections
Netflix/maestro
Maestro: Netflix’s Workflow Orchestrator
xyflow/awesome-node-based-uis
A curated list with resources about node-based UIs
blockchain-etl/ethereum-etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
ucbepic/docetl
A system for agentic LLM-powered data processing and ETL
apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
cocoindex-io/cocoindex
Data transformation framework for AI. Ultra performant, with incremental processing.
PeerDB-io/peerdb
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
TobikoData/sqlmesh
Scalable and efficient data transformation framework - backwards compatible with dbt.
instill-ai/instill-core
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
apache/hamilton
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
thenaturalist/awesome-business-intelligence
Actively curated list of awesome BI tools. PRs welcome!
reugn/go-streams
A lightweight stream processing library for Go
mara/mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
AlexIoannides/pyspark-example-project
Implementing best practices for PySpark ETL jobs and applications.
superglue-ai/superglue
superglue integrates & orchestrates APIs from natural language. Agents use it to build deterministic workflows across apps, APIs and databases. Humans use it to generate insights, build automations and manage data.
timeplus-io/proton
Fastest SQL pipeline engine in a single C++ binary, for stream processing, analytics, observability and AI.
data-engineering-community/data-engineering-wiki
The best place to learn data engineering. Built and maintained by the data engineering community.
thbar/kiba
Data processing & ETL framework for Ruby
Multiwoven/multiwoven
🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census.
nerevu/riko
A Python stream processing engine modeled after Yahoo! Pipes
ariacom/Seal-Report
Database Reporting Tool and Tasks (.Net)
getdozer/dozer
Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks.