Pinned Repositories
Alink
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
aresdb
A GPU-powered real-time analytics storage and query engine.
arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
arrow-datafusion-python
Apache Arrow DataFusion Python Bindings
AVX-Memmove
Highly optimized versions of memmove, memcpy, memset, and memcmp supporting SSE4.2, AVX, AVX2, and AVX512
benchm-ml
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
benchmarks
Benchmark code
blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
flink-jar
deploy apache flink as a micro service
transformer_user_action
Transformer-based Realtime User Action Model for Recommendation at Pinterest
chenqin's Repositories
chenqin/arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
chenqin/arrow-datafusion-python
Apache Arrow DataFusion Python Bindings
chenqin/blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
chenqin/cascading-flink
Cascading on Apache Flink®
chenqin/flink
Mirror of Apache Flink
chenqin/transformer_user_action
Transformer-based Realtime User Action Model for Recommendation at Pinterest
chenqin/xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
chenqin/chenqin
Config files for my GitHub profile.
chenqin/datasketches-cpp
Core C++ Sketch Library
chenqin/docker-hive
Docker image for Apache Hive Metastore
chenqin/examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
chenqin/FlexGen
Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput generation.
chenqin/flink-cdc-connectors
CDC Connectors for Apache Flink®
chenqin/flink-connector-hive
Apache flink
chenqin/gluten
Gluten: Plugin to Double SparkSQL's Performance
chenqin/hive-metastore-docker
Example for article Running Spark 3 with standalone Hive Metastore 3.0
chenqin/librdkafka
The Apache Kafka C/C++ library
chenqin/llama
Inference code for LLaMA models
chenqin/modern-cpp-kafka
Modern C++ based Kafka API
chenqin/nebula
A distributed block-based data storage and compute engine
chenqin/openssl-cmake
Build OpenSSL with CMake on MacOS, Win32, Win64 and cross compile for Android, IOS
chenqin/pytorch-cpp
C++ Implementation of PyTorch Tutorials for Everyone
chenqin/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.
chenqin/react-digraph
A library for creating directed graph editors
chenqin/scalding
A Scala API for Cascading
chenqin/spark
Apache Spark - A unified analytics engine for large-scale data processing
chenqin/spark-on-k8s-operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
chenqin/SparkInternals
Notes talking about the design and implementation of Apache Spark
chenqin/substrait
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
chenqin/velox
A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.