etl
There are 3712 repositories under etl topic.
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
apache/doris
Apache Doris is an easy-to-use, high performance and unified analytics database.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
redpanda-data/connect
Fancy stream processing made operationally mundane
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
kestra-io/kestra
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
turbot/steampipe
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
risingwavelabs/risingwave
SQL stream processing, analytics, and management. We decouple storage and compute to offer instant failover, dynamic scaling, speedy bootstrapping, and efficient joins.
cloudquery/cloudquery
The open source high performance ELT framework powered by Apache Arrow
apache/flink-cdc
Flink CDC is a streaming data integration tool
orchest/orchest
Build data pipelines, the easy way 🛠️
rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React
aws/aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
blockchain-etl/ethereum-etl
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
quadratichq/quadratic
Quadratic | Data Science Spreadsheet with Python & SQL
apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
nucleuscloud/neosync
Open source data anonymization and synthetic data orchestration for developers. Create high fidelity synthetic data and sync it across your environments.
mara/mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
xyflow/awesome-node-based-uis
A curated list with resources about node-based UIs
thenaturalist/awesome-business-intelligence
Actively curated list of awesome BI tools. PRs welcome!
instill-ai/instill-core
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
PeerDB-io/peerdb
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
reugn/go-streams
A lightweight stream processing library for Go
thbar/kiba
Data processing & ETL framework for Ruby
nerevu/riko
A Python stream processing engine modeled after Yahoo! Pipes
AlexIoannides/pyspark-example-project
Implementing best practices for PySpark ETL jobs and applications.
getdozer/dozer
Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks.
DAGWorks-Inc/hamilton
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
compose/transporter
Sync data between persistence engines, like ETL only not stodgy
TobikoData/sqlmesh
Efficient data transformation and modeling framework that is backwards compatible with dbt.
ariacom/Seal-Report
Database Reporting Tool and Tasks (.Net)
rwynn/monstache
a go daemon that syncs MongoDB to Elasticsearch in realtime. you know, for search.
singer-io/getting-started
This repository is a getting started guide to Singer.
data-engineering-community/data-engineering-wiki
The best place to learn data engineering. Built and maintained by the data engineering community.