data-integration

There are 532 repositories under data-integration topic.

  • airflow

    apache/airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    Language:Python42.4k76312.6k15.6k
  • airbytehq/airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    Language:Python19.5k19215.4k4.8k
  • taipy

    Avaiga/taipy

    Turns Data and AI algorithms into production-ready web applications in no time.

    Language:Python18.6k821.1k1.9k
  • dagster-io/dagster

    An orchestration platform for the development, production, and observation of data assets.

    Language:Python14k1238.1k1.8k
  • apache/seatunnel

    SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.

    Language:Java8.8k1724.2k2.1k
  • mage-ai/mage-ai

    🧙 Build, run, and manage data pipelines for integrating and transforming data.

    Language:Python8.5k64987868
  • apache/flink-cdc

    Flink CDC is a streaming data integration tool

    Language:Java6.2k1361.7k2.1k
  • cloudquery

    cloudquery/cloudquery

    The open source ELT framework powered by Apache Arrow

    Language:Go6.2k662.2k542
  • apache/hudi

    Upserts, Deletes And Incremental Processing on Big Data.

    Language:Java5.9k1.2k3.5k2.4k
  • fluvio

    infinyon/fluvio

    🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.

    Language:Rust5k461.6k516
  • jitsu

    jitsucom/jitsu

    Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

    Language:TypeScript4.4k46594315
  • rudder-server

    rudderlabs/rudder-server

    Privacy and Security focused Segment-alternative, in Golang and React

    Language:Go4.3k621424
  • DTStack/chunjun

    A data integration framework

    Language:Java4.1k1651.2k1.7k
  • seandavi/awesome-single-cell

    Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.

  • bruin-data/ingestr

    ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

    Language:Python3.2k2037106
  • apache/incubator-devlake

    Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

    Language:Go2.8k503.5k620
  • mara/mara-pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

    Language:Python2.1k5434101
  • bytedance/bitsail

    BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

    Language:Java1.7k61212334
  • apache/hop

    Hop Orchestration Platform

    Language:Java1.2k481.9k401
  • heathersherry/Knowledge-Graph-Tutorials-and-Papers

    Insightful Tutorials and Papers about Knowledge Graphs

  • kuwala

    kuwala-io/kuwala

    Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

    Language:JavaScript800127254
  • apache/seatunnel-web

    SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

    Language:Java733250323
  • artie-labs/transfer

    Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.

    Language:Go67194440
  • immunogenomics/harmony

    Fast, sensitive and accurate integration of single-cell data with Harmony

    Language:R59924234105
  • saeyslab/nichenetr

    NicheNet: predict active ligand-target links between interacting cells

    Language:R56713293129
  • leesf/hudi-resources

    汇总Apache Hudi相关资料

  • conduit

    ConduitIO/conduit

    Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

    Language:Go5541858155
  • theislab/scarches

    Reference mapping for single-cell genomics

    Language:Jupyter Notebook3821118663
  • pracdata/awesome-open-source-data-engineering

    A curated list of open source tools used in analytics platforms and data engineering ecosystem

  • graphform/swim-rust

    Self-contained distributed software platform for building stateful, massively real-time streaming applications in Rust.

    Language:Rust350
  • recap

    gabledata/recap

    Work with your web service, database, and streaming schemas in a single format.

    Language:Python3431013326
  • CategoricalData/CQL

    Categorical Query Language IDE

    Language:Java314309124
  • hetio/hetionet

    Hetionet: an integrative network of disease

    Language:HTML304144771
  • elbwalker/walkerOS

    Open source tag management and event data collection

    Language:TypeScript303823015
  • cuebook/cuelake

    Use SQL to build ELT pipelines on a data lakehouse.

    Language:JavaScript288112928
  • CommonCoreOntology/CommonCoreOntologies

    The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.

    Language:Makefile2534244170