JkSelf

intelshanghai

Pinned Repositories

analytics-zoo
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Language:Jupyter Notebook00
arrow-1
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
Language:C++0 1 00
arrow-data-source
Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.
Language:Scala0 1 00
arrow-datafusion-comet
Apache Arrow DataFusion Comet Spark Accelerator
Language:Rust00
async-profiler
Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events
Language:C++00
bert
TensorFlow code and pre-trained models for BERT
Language:Python0 1 00
Big-Data-Benchmark-for-Big-Bench
Big Bench Workload Development
Language:Shell00
BigDL
BigDL: Distributed Deep Learning Framework for Apache Spark
Language:Scala0 1 00
chimera
Cryptographic library optimized with AES-NI
Language:Java00
hadoop_study
定期更新Hadoop生态圈中常用大数据组件文档重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图印象笔记 Scala版本简单demo 常用工具类去敏后的train code 持续更新!!!)
Language:Java1 1 02

JkSelf's Repositories

JkSelf/analytics-zoo
Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
Language:Jupyter Notebook00
JkSelf/arrow-1
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
Language:C++0 1 00
JkSelf/arrow-data-source
Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.
Language:Scala0 1 00
JkSelf/arrow-datafusion-comet
Apache Arrow DataFusion Comet Spark Accelerator
Language:Rust00
JkSelf/async-profiler
Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events
Language:C++00
JkSelf/BigDL
BigDL: Distributed Deep Learning Framework for Apache Spark
Language:Scala0 1 00
JkSelf/ecosystem
Integration of TensorFlow with other open-source frameworks
JkSelf/gluten
JkSelf/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Language:Python1 0
JkSelf/horovodRunnerBenchMark_IPython
same as horovodRunnerBenchMark but ipyton version for better readability
Language:Jupyter Notebook1 0
JkSelf/IntelQATCodec
Language:Java0 0
JkSelf/keras-bert
Implementation of BERT that could load official pre-trained models for feature extraction and prediction
JkSelf/keras_bert_text_classification
本项目采用Keras和Keras-bert实现文本多分类任务，对BERT进行微调。
JkSelf/models
Model Zoo for Intel® Architecture: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors
Language:Python1 0
JkSelf/native-sql-engine
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
Language:Scala
JkSelf/OAP
Optimized Analytics Package for Spark* Platform
Language:Scala1 0
JkSelf/oneCCL
oneAPI Collective Communications Library (oneCCL)
Language:C++1 0
JkSelf/oneDAL
oneAPI Data Analytics Library (oneDAL)
Language:C++1 0
JkSelf/petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Language:Python0 0
JkSelf/presto
The official home of the Presto distributed SQL query engine for big data
Language:Java0 0
JkSelf/ray
A fast and simple framework for building and running distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
Language:Python1 0
JkSelf/raydp
RayDP: Distributed data processing library that provides simple APIs for running Spark on Ray and integrating Spark with distributed deep learning and machine learning frameworks.
Language:Python1 0
JkSelf/spark
Mirror of Apache Spark
Language:Scala2 0
JkSelf/spark-adaptive
Language:Scala1
JkSelf/spark-nlp
State of the Art Natural Language Processing
JkSelf/sql-ds-cache
Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.
JkSelf/substrait
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
Language:HTML0 0
JkSelf/tensorflow
An Open Source Machine Learning Framework for Everyone
Language:C++0 0
JkSelf/velox
A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
Language:C++0 02
JkSelf/xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
Language:C++1 0