Pinned Repositories
1brc
1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
ansj_seg
ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典
arrow-ballista
Apache Arrow Ballista Distributed Query Engine
blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Burrow
Kafka Consumer Lag Checking
celeborn
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
concurrent-map
a thread-safe concurrent map for go
config
configuration library for JVM languages using HOCON files
debezium
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
paimon
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
harveyyue's Repositories
harveyyue/1brc
1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
harveyyue/arrow-ballista
Apache Arrow Ballista Distributed Query Engine
harveyyue/blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
harveyyue/celeborn
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
harveyyue/debezium
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
harveyyue/hudi
Upserts And Incremental Processing on Big Data
harveyyue/paimon
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
harveyyue/starrocks
StarRocks is a next-gen sub-second MPP database for full analysis senarios, including multi-dimensional analytics, real-time analytics and ad-hoc query, formerly known as DorisDB.
harveyyue/datafusion
Apache Arrow DataFusion SQL Query Engine
harveyyue/datafusion-orc
Implementation of Apache ORC file format use Apache Arrow in-memory format
harveyyue/debezium-connector-cassandra
An incubating Debezium CDC connector for Apache Cassandra
harveyyue/debezium-connector-db2
An incubating Debezium connector for Db2
harveyyue/debezium-connector-informix
An incubating Debezium CDC connector for IBM Informix database
harveyyue/debezium-connector-jdbc
An exploration for building a JDBC sink connector aware of the Debezium change event format
harveyyue/debezium-connector-spanner
An incubating Debezium CDC connector for Google Spanner
harveyyue/debezium-connector-vitess
An incubating Debezium CDC connector for Vitess
harveyyue/doris
Apache Doris (Incubating)
harveyyue/flink
Apache Flink
harveyyue/flink-cdc-connectors
Change Data Capture (CDC) Connectors for Apache Flink
harveyyue/kafka
Mirror of Apache Kafka
harveyyue/kafka-connect-jdbc
Kafka Connect connector for JDBC-compatible databases
harveyyue/kafka-connect-storage-cloud
Kafka Connect suite of connectors for Cloud storage (Amazon S3)
harveyyue/kcctl
A modern and intuitive command line client for Kafka Connect
harveyyue/merkle-proof
harveyyue/mysql-binlog-connector-java
MySQL Binary Log connector
harveyyue/schema-registry
Confluent Schema Registry for Kafka
harveyyue/seatunnel
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
harveyyue/spark
Apache Spark - A unified analytics engine for large-scale data processing
harveyyue/tiflow
This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
harveyyue/web3j
Lightweight Java and Android library for integration with Ethereum clients