data-integration
There are 532 repositories under data-integration topic.
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Avaiga/taipy
Turns Data and AI algorithms into production-ready web applications in no time.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
apache/seatunnel
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
apache/flink-cdc
Flink CDC is a streaming data integration tool
cloudquery/cloudquery
The open source ELT framework powered by Apache Arrow
apache/hudi
Upserts, Deletes And Incremental Processing on Big Data.
infinyon/fluvio
🦀 event stream processing for developers to collect and transform data in motion to power responsive data intensive applications.
jitsucom/jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React
DTStack/chunjun
A data integration framework
seandavi/awesome-single-cell
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
mara/mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
bytedance/bitsail
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
apache/hop
Hop Orchestration Platform
heathersherry/Knowledge-Graph-Tutorials-and-Papers
Insightful Tutorials and Papers about Knowledge Graphs
kuwala-io/kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
apache/seatunnel-web
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
artie-labs/transfer
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
immunogenomics/harmony
Fast, sensitive and accurate integration of single-cell data with Harmony
saeyslab/nichenetr
NicheNet: predict active ligand-target links between interacting cells
leesf/hudi-resources
汇总Apache Hudi相关资料
ConduitIO/conduit
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
theislab/scarches
Reference mapping for single-cell genomics
pracdata/awesome-open-source-data-engineering
A curated list of open source tools used in analytics platforms and data engineering ecosystem
graphform/swim-rust
Self-contained distributed software platform for building stateful, massively real-time streaming applications in Rust.
gabledata/recap
Work with your web service, database, and streaming schemas in a single format.
CategoricalData/CQL
Categorical Query Language IDE
hetio/hetionet
Hetionet: an integrative network of disease
elbwalker/walkerOS
Open source tag management and event data collection
cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
CommonCoreOntology/CommonCoreOntologies
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.