Pinned Repositories
analytics-zoo
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
arrow-1
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
arrow-data-source
Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.
arrow-datafusion-comet
Apache Arrow DataFusion Comet Spark Accelerator
async-profiler
Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events
bert
TensorFlow code and pre-trained models for BERT
Big-Data-Benchmark-for-Big-Bench
Big Bench Workload Development
BigDL
BigDL: Distributed Deep Learning Framework for Apache Spark
chimera
Cryptographic library optimized with AES-NI
hadoop_study
定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
JkSelf's Repositories
JkSelf/analytics-zoo
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
JkSelf/arrow-1
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
JkSelf/arrow-data-source
Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.
JkSelf/arrow-datafusion-comet
Apache Arrow DataFusion Comet Spark Accelerator
JkSelf/async-profiler
Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events
JkSelf/BigDL
BigDL: Distributed Deep Learning Framework for Apache Spark
JkSelf/ecosystem
Integration of TensorFlow with other open-source frameworks
JkSelf/gluten
JkSelf/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
JkSelf/horovodRunnerBenchMark_IPython
same as horovodRunnerBenchMark but ipyton version for better readability
JkSelf/IntelQATCodec
JkSelf/keras-bert
Implementation of BERT that could load official pre-trained models for feature extraction and prediction
JkSelf/keras_bert_text_classification
本项目采用Keras和Keras-bert实现文本多分类任务,对BERT进行微调。
JkSelf/models
Model Zoo for Intel® Architecture: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors
JkSelf/native-sql-engine
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
JkSelf/OAP
Optimized Analytics Package for Spark* Platform
JkSelf/oneCCL
oneAPI Collective Communications Library (oneCCL)
JkSelf/oneDAL
oneAPI Data Analytics Library (oneDAL)
JkSelf/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
JkSelf/presto
The official home of the Presto distributed SQL query engine for big data
JkSelf/ray
A fast and simple framework for building and running distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
JkSelf/raydp
RayDP: Distributed data processing library that provides simple APIs for running Spark on Ray and integrating Spark with distributed deep learning and machine learning frameworks.
JkSelf/spark
Mirror of Apache Spark
JkSelf/spark-adaptive
JkSelf/spark-nlp
State of the Art Natural Language Processing
JkSelf/sql-ds-cache
Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
JkSelf/substrait
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
JkSelf/tensorflow
An Open Source Machine Learning Framework for Everyone
JkSelf/velox
A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
JkSelf/xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow