Pinned Repositories
analytics-zoo
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
arrow-1
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
arrow-data-source
Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.
arrow-datafusion-comet
Apache Arrow DataFusion Comet Spark Accelerator
async-profiler
Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events
bert
TensorFlow code and pre-trained models for BERT
Big-Data-Benchmark-for-Big-Bench
Big Bench Workload Development
BigDL
BigDL: Distributed Deep Learning Framework for Apache Spark
chimera
Cryptographic library optimized with AES-NI
hadoop_study
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
JkSelf's Repositories
JkSelf/hadoop_study
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
JkSelf/bert
TensorFlow code and pre-trained models for BERT
JkSelf/Big-Data-Benchmark-for-Big-Bench
Big Bench Workload Development
JkSelf/chimera
Cryptographic library optimized with AES-NI
JkSelf/commons-crypto
Mirror of Apache Commons Crypto
JkSelf/DigAndBuried
挖坑与填坑
JkSelf/hadoop
Mirror of Apache Hadoop
JkSelf/hadoop-tutorials-examples
Source, data and turotials of the blog post video series of Hue, the Web UI for Hadoop.
JkSelf/hive
Mirror of Apache Hive
JkSelf/hive-testbench
Testbench for experimenting with Apache Hive at any data scale.
JkSelf/kraps-rpc
A RPC framework leveraging Spark RPC module
JkSelf/oap-perf-suite
OAP Cluster Performance TestSuite
JkSelf/oap-perf-suite-1
JkSelf/orc
Mirror of Apache Orc
JkSelf/PAT
Performance Analysis Tool
JkSelf/plasma
A minimal shared memory object store design
JkSelf/sentry
Mirror of Apache Sentry
JkSelf/spark-on-k8s
An Integrated and collaborative cloud environment for building and running Spark applications on PKS/Kubernetes
JkSelf/spark-operator
Operator for managing the Spark clusters on Kubernetes and OpenShift.
JkSelf/Spark-PMoF
Spark Shuffle Optimization with RDMA+AEP
JkSelf/spark-sql-perf
JkSelf/SparkInternals
Notes talking about the design and implementation of Apache Spark
JkSelf/splash
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
JkSelf/tbb
Official Threading Building Blocks (TBB) GitHub repository. For Commercial Intel® TBB distribution, please click here: https://software.intel.com/en-us/tbb
JkSelf/TDengine
An open-source big data platform designed and optimized for the Internet of Things (IoT).