data-integration
There are 395 repositories under data-integration topic.
apache/airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
Avaiga/taipy
Turns Data and AI algorithms into production-ready web applications in no time.
apache/seatunnel
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data.
kestra-io/kestra
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
cloudquery/cloudquery
The open source high performance ELT framework powered by Apache Arrow
apache/flink-cdc
Flink CDC is a streaming data integration tool
apache/hudi
Upserts, Deletes And Incremental Processing on Big Data.
rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React
DTStack/chunjun
A data integration framework
jitsucom/jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
seandavi/awesome-single-cell
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
infinyon/fluvio
Lean and mean distributed stream processing system written in rust and web assembly.
apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
mara/mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
bytedance/bitsail
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
apache/hop
Hop Orchestration Platform
kuwala-io/kuwala
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
artie-labs/transfer
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
leesf/hudi-resources
汇总Apache Hudi相关资料
immunogenomics/harmony
Fast, sensitive and accurate integration of single-cell data with Harmony
saeyslab/nichenetr
NicheNet: predict active ligand-target links between interacting cells
apache/seatunnel-web
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
ConduitIO/conduit
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
theislab/scarches
Reference mapping for single-cell genomics
recap-build/recap
Work with your web service, database, and streaming schemas in a single format.
CategoricalData/CQL
Categorical Query Language IDE
cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
hetio/hetionet
Hetionet: an integrative network of disease
opensanctions/nomenklatura
Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources
dataplane-app/dataplane
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.
mara/mara-example-project-2
An example mini data warehouse for python project stats, template for new projects
slowkow/harmonypy
🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.