data-ingestion
There are 158 repositories under data-ingestion topic.
apache/seatunnel
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
apache/paimon
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
dashbitco/broadway
Concurrent and multi-stage data ingestion and data processing with Elixir
pravega/pravega
Pravega - Streaming as a new software defined storage primitive
bruin-data/bruin
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
CrunchyData/pg_parquet
Copy to/from Parquet in S3 from within PostgreSQL
orbitalapi/orbital
Orbital automates integration between data sources (APIs, Databases, Queues and Functions). BFF's, API Composition and ETL pipelines that adapt as your specs change.
cuebook/cuelake
Use SQL to build ELT pipelines on a data lakehouse.
merantix-momentum/squirrel-core
A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:
thedataengineeringbook/thedataengineeringbook
The Data Engineering Book - หนังสือวิศวกรรมข้อมูล ของคนไทย เพื่อคนไทย
apache/paimon-rust
Apache Paimon Rust The rust implementation of Apache Paimon.
jgperrin/net.jgp.labs.spark
Apache Spark examples exclusively in Java
merantix-momentum/squirrel-datasets-core
Squirrel dataset hub
aws-samples/amazon-kinesis-data-processor-aws-fargate
Sample code for the AWS Big Data Blog Post Building a scalable streaming data processor with Amazon Kinesis Data Streams on AWS Fargate
Dynatrace/openkit-java
OpenKit Java Reference Implementation
Dynatrace/OneAgent-SDK-for-Java
Enables custom tracing of Java applications in Dynatrace
fremantle-industries/history
Download and warehouse historical trading data
linkedin/data-integration-library
The Data Integration Library project provides a library of generic components based on a multi-stage architecture for data ingress and egress.
Dynatrace/OneAgent-SDK-for-Python
Enables custom tracing of Python applications in Dynatrace
Dynatrace/OneAgent-SDK
Describes technical concepts of Dynatrace OneAgent SDK
Dynatrace/OneAgent-SDK-for-C
Enables custom tracing of native applications in Dynatrace
Dynatrace/OneAgent-SDK-for-dotnet
Enables custom tracing of .NET applications in Dynatrace
juansimon27/scrapy-walmart
Product scraping from Walmart Canada website, with further cleaning and integration of data from a different store.
Dynatrace/OneAgent-SDK-for-NodeJs
Enables custom tracing of Node.js applications in Dynatrace
varunbpatil/cosmos
Airbyte clone written in Go and Vue.js. Works with Airbyte connectors.
Dynatrace/openkit-dotnet
OpenKit .NET Reference Implementation
Dynatrace/agent-nodejs
Dynatrace agent for PaaS environments
yuvaneshkm/dbsconnector
Python package for seamless data integration from multiple sources like CSV, Excel, Google Sheets, and MongoDB. It simplifies data loading and transformation with a unified interface, supporting future expansions to more databases and cloud storage services.
DeleLinus/HFR-Data-Warehousing
End-to-end data engineering processes for the NIGERIA Health Facility Registry (HFR). The project leveraged Selenium, Pandas, PySpark, PostgreSQL and Airflow
apache/paimon-python
Apache Paimon Python The Python implementation of Apache Paimon.
robert-koch-institut/mex-drop
RKI Metadata Exchange | API and GUI micro service for distributing metadata items before it gets picked up by ETL-pipelines for further processing.
yuvaneshkm/Retail-Sales-Analysis
This project is an end-to-end data analytics solution for a retail business, aimed at uncovering insights into sales performance, customer behavior, and product trends.