Pinned Repositories
spark
Apache Spark - A unified analytics engine for large-scale data processing
arrow-datafusion
Apache Arrow DataFusion SQL Query Engine
arrow-datafusion-comet
Apache Arrow DataFusion Comet Spark Accelerator
arrow-rs
Official Rust implementation of Apache Arrow
delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
hudi
Upserts, Deletes And Incremental Processing on Big Data.
hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
iceberg
Apache Iceberg
jvm-profiler
JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter
parquet-mr
Apache Parquet
huaxingao's Repositories
huaxingao/iceberg
Apache Iceberg
huaxingao/arrow-datafusion
Apache Arrow DataFusion SQL Query Engine
huaxingao/arrow-datafusion-comet
Apache Arrow DataFusion Comet Spark Accelerator
huaxingao/arrow-rs
Official Rust implementation of Apache Arrow
huaxingao/delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
huaxingao/hudi
Upserts, Deletes And Incremental Processing on Big Data.
huaxingao/hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
huaxingao/jvm-profiler
JVM Profiler Sending Metrics to Kafka, Console Output or Custom Reporter
huaxingao/orc
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
huaxingao/parquet-format
Apache Parquet
huaxingao/parquet-mr
Apache Parquet
huaxingao/scikit-learn
scikit-learn: machine learning in Python
huaxingao/presto
Distributed SQL query engine for big data
huaxingao/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
huaxingao/spark
Mirror of Apache Spark
huaxingao/spark-examples
official spark examples adapted for sbt
huaxingao/spark-redshift
Spark and Redshift integration
huaxingao/spark-website
Apache Spark Website
huaxingao/trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)