elt

There are 336 repositories under elt topic.

  • airflow

    apache/airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    Language:Python39.4k76411.1k14.8k
  • airbytehq/airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    Language:Python17.7k18715k4.4k
  • apache/doris

    Apache Doris is an easy-to-use, high performance and unified analytics database.

    Language:Java13.4k2847.7k3.4k
  • dbt-core

    dbt-labs/dbt-core

    dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

    Language:Python10.6k1435.7k1.7k
  • apache/seatunnel

    SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

    Language:Java8.4k1733.8k1.9k
  • mage-ai/mage-ai

    🧙 Build, run, and manage data pipelines for integrating and transforming data.

    Language:Python8.2k62950829
  • cloudquery

    cloudquery/cloudquery

    The developer first cloud governance platform

    Language:Go6k652.2k525
  • apache/flink-cdc

    Flink CDC is a streaming data integration tool

    Language:Java6k1331.7k2k
  • rudder-server

    rudderlabs/rudder-server

    Privacy and Security focused Segment-alternative, in Golang and React

    Language:Go4.2k62142326
  • dlt-hub/dlt

    data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

    Language:Python3.4k22797241
  • quary

    quarylabs/quary

    Open-source BI for engineers

    Language:Rust2.3k134555
  • TobikoData/sqlmesh

    Efficient data transformation and modeling framework that is backwards compatible with dbt.

    Language:Python2.2k28729199
  • meltano/meltano

    Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

    Language:Python2k136.7k173
  • ucbepic/docetl

    A system for agentic LLM-powered data processing and ETL

    Language:Python1.7k18111163
  • dataform-co/dataform

    Dataform is a framework for managing SQL based data operations in BigQuery

    Language:TypeScript88323534173
  • kuwala

    kuwala-io/kuwala

    Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

    Language:JavaScript792127254
  • raystack/optimus

    Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

    Language:Go74715268155
  • artie-labs/transfer

    Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.

    Language:Go63994432
  • datazip-inc/olake

    Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB and MySQL

    Language:Go59688660
  • sling-cli

    slingdata-io/sling-cli

    Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.

    Language:Go5341035342
  • automate-dv

    Datavault-UK/automate-dv

    A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)

  • gouline/dbt-metabase

    dbt + Metabase integration

    Language:Python509910175
  • versatile-data-kit

    vmware/versatile-data-kit

    One framework to develop, deploy and operate data workflows with Python and SQL.

    Language:Python4441594760
  • osalvador/ReplicaDB

    ReplicaDB is open source tool for database replication, designed for efficiently transferring bulk data between relational and non-relational databases

    Language:Java43325112102
  • astronomer/astro-sdk

    Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

    Language:Python367983248
  • aws-samples/aws-etl-orchestrator

    A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.

    Language:Python336387138
  • DataRecce/recce

    The data-validation toolkit for enhanced dbt (data build tool) PR review

    Language:TypeScript32988411
  • cuebook/cuelake

    Use SQL to build ELT pipelines on a data lakehouse.

    Language:JavaScript285112928
  • datacoves/dbt-coves

    CLI tool for dbt users to simplify creation of staging models (yml and sql) files

    Language:Python261106016
  • airbytehq/PyAirbyte

    PyAirbyte brings the power of Airbyte to every Python developer.

    Language:Python257422247
  • umitkaanusta/reddit-detective

    Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more

    Language:Python21361815
  • airbyte_serverless

    unytics/airbyte_serverless

    Airbyte made simple (no UI, no database, no cluster)

    Language:Python1693212
  • sayn

    173TECH/sayn

    Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).

    Language:Python12354716
  • faros-ai/airbyte-connectors

    Airbyte connectors (sources & destinations) + Airbyte CDK for JavaScript/TypeScript

    Language:TypeScript1151310864
  • transferia/transferia

    Open Source Cloud Native Ingestion engine

    Language:Go10213
  • yokawasa/databricks-notebooks

    Collection of Sample Databricks Spark Notebooks ( mostly for Azure Databricks )

    Language:Jupyter Notebook868075