Pinned Repositories
flink-cdc
Flink CDC is a streaming data integration tool
iceberg
Apache Iceberg
kyuubi
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
paimon
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
spark
Apache Spark - A unified analytics engine for large-scale data processing
airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
aliyun-odps-java-sdk
ODPS SDK for Java Developers
alluxio
Alluxio, data orchestration for analytics and machine learning in the cloud
flink
Apache Flink
incubator-iceberg
Apache Iceberg (Incubating)
zhaomin1423's Repositories
zhaomin1423/incubator-iceberg
Apache Iceberg (Incubating)
zhaomin1423/airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
zhaomin1423/arctic
Arctic is a streaming lake warehouse service open sourced by NetEase
zhaomin1423/flink
Apache Flink
zhaomin1423/flink-cdc-connectors
Change Data Capture (CDC) Connectors for Apache Flink
zhaomin1423/arrow-datafusion-comet
Apache Arrow DataFusion Comet Spark Accelerator
zhaomin1423/automq
AutoMQ is a cloud-native fork of Kafka by separating storage to S3. 10x cost-effective. Autoscale in seconds. Single-digit ms latency.
zhaomin1423/bitsail
BitSail is a distributed, high-performance data integration engine and provides global data integration solutions in batch, streaming, and incremental scenarios. At present, BitSail has been widely used and synchronizes hundreds of trillions data every day.
zhaomin1423/blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
zhaomin1423/debezium
Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.
zhaomin1423/delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, Ruby, and Python.
zhaomin1423/dolphinscheduler
Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available out of box.
zhaomin1423/elasticsearch-hadoop
:elephant: Elasticsearch real-time search and analytics natively integrated with Hadoop
zhaomin1423/example-custom-event-handler
zhaomin1423/gravitino
A high-performance, geo-distributed and federated metadata lake
zhaomin1423/incubator-doris
Apache Doris (Incubating)
zhaomin1423/incubator-kyuubi-website
Apache Kyuubi Site
zhaomin1423/incubator-livy
Mirror of Apache livy (Incubating)
zhaomin1423/incubator-paimon
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
zhaomin1423/incubator-seatunnel
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
zhaomin1423/kyuubi
Kyuubi is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark
zhaomin1423/kyuubi-client
Client libraries of end users of Apache Kyuubi
zhaomin1423/metacat
zhaomin1423/OpenLineage
An Open Standard for lineage metadata collection
zhaomin1423/pulsar
Apache Pulsar - distributed pub-sub messaging system
zhaomin1423/spark
Apache Spark - A unified analytics engine for large-scale data processing
zhaomin1423/spark-clickhouse-connector
Spark ClickHouse Connector build on DataSourceV2 API and gRPC protocol.
zhaomin1423/spark-distcp
A re-implementation of Hadoop DistCP in Apache Spark
zhaomin1423/spark-sql-dsv2-extension
A sql extension build on spark3 datasource v2 api, ex: hive v2 catalog support amoung multi clusters
zhaomin1423/starrocks-connector-for-apache-spark