data-integration

There are 569 repositories under data-integration topic.

  • airflow

    apache/airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

    Language:Python43.1k75813.1k15.9k
  • airbytehq/airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    Language:Python20k18615.5k4.9k
  • taipy

    Avaiga/taipy

    Turns Data and AI algorithms into production-ready web applications in no time.

    Language:Python18.9k811.1k2k
  • dagster-io/dagster

    An orchestration platform for the development, production, and observation of data assets.

    Language:Python14.4k1208.2k1.9k
  • apache/seatunnel

    SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.

    Language:Java8.9k1704.3k2.1k
  • mage-ai/mage-ai

    🧙 Build, run, and manage data pipelines for integrating and transforming data.

    Language:Python8.5k631k888
  • apache/flink-cdc

    Flink CDC is a streaming data integration tool

    Language:Java6.3k1371.7k2.1k
  • cloudquery

    cloudquery/cloudquery

    Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.

    Language:Go6.2k612.2k542
  • apache/hudi

    Upserts, Deletes And Incremental Processing on Big Data.

    Language:Java6k1.1k3.5k2.4k
  • fluvio

    infinyon/fluvio

    🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.

    Language:Rust5.1k451.6k519
  • jitsu

    jitsucom/jitsu

    Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

    Language:TypeScript4.6k46595329
  • rudder-server

    rudderlabs/rudder-server

    Privacy and Security focused Segment-alternative, in Golang and React

    Language:Go4.3k601469
  • DTStack/chunjun

    A data integration framework

    Language:Java4.1k1631.2k1.7k
  • seandavi/awesome-single-cell

    Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.

  • bruin-data/ingestr

    ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

    Language:Python3.3k1938113
  • apache/incubator-devlake

    Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

    Language:Go2.9k483.5k645
  • mara/mara-pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

    Language:Python2.1k533499
  • bytedance/bitsail

    BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

    Language:Java1.7k58212334
  • apache/hop

    Hop Orchestration Platform

    Language:Java1.3k481.9k414
  • heathersherry/Knowledge-Graph-Tutorials-and-Papers

    Insightful Tutorials and Papers about Knowledge Graphs

  • kuwala

    kuwala-io/kuwala

    Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times

    Language:JavaScript805127255
  • apache/seatunnel-web

    SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

    Language:Java747250327
  • artie-labs/transfer

    Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.

    Language:Go681104542
  • immunogenomics/harmony

    Fast, sensitive and accurate integration of single-cell data with Harmony

    Language:R60324234105
  • saeyslab/nichenetr

    NicheNet: predict active ligand-target links between interacting cells

    Language:R58613305132
  • conduit

    ConduitIO/conduit

    Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

    Language:Go5651259855
  • leesf/hudi-resources

    汇总Apache Hudi相关资料

  • pracdata/awesome-open-source-data-engineering

    A curated list of open source tools used in analytics platforms and data engineering ecosystem

  • theislab/scarches

    Reference mapping for single-cell genomics

    Language:Jupyter Notebook3871018764
  • dataplane-app/dataplane

    Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.

    Language:JavaScript35663945
  • recap

    gabledata/recap

    Work with your web service, database, and streaming schemas in a single format.

    Language:Python3431013326
  • CategoricalData/CQL

    Categorical Query Language IDE

    Language:Java319299225
  • hetio/hetionet

    Hetionet: an integrative network of disease

    Language:HTML318144771
  • cuebook/cuelake

    Use SQL to build ELT pipelines on a data lakehouse.

    Language:JavaScript288112928
  • CommonCoreOntology/CommonCoreOntologies

    The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.

    Language:Makefile2744244871
  • slowkow/harmonypy

    🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.

    Language:Python24143325