chenqin

Batch Processing & Stream Processing Platform at Pinterest

Pinned Repositories

Alink
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Language:Java0 1 00
aresdb
A GPU-powered real-time analytics storage and query engine.
Language:Go0 2 00
arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
Language:C++0 2 00
arrow-datafusion-python
Apache Arrow DataFusion Python Bindings
Language:Rust0 0 00
AVX-Memmove
Highly optimized versions of memmove, memcpy, memset, and memcmp supporting SSE4.2, AVX, AVX2, and AVX512
Language:C0 0 00
benchm-ml
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Language:R0 2 00
benchmarks
Benchmark code
Language:Python0 2 00
blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Language:Rust0 0 00
flink-jar
deploy apache flink as a micro service
Language:Java2 2 01
transformer_user_action
Transformer-based Realtime User Action Model for Recommendation at Pinterest
Language:Python00

chenqin's Repositories

chenqin/arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
Language:C++0 2 00
chenqin/arrow-datafusion-python
Apache Arrow DataFusion Python Bindings
Language:Rust0 0 00
chenqin/blaze
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Language:Rust0 0 00
chenqin/cascading-flink
Cascading on Apache Flink®
Language:Java0 0 00
chenqin/flink
Mirror of Apache Flink
Language:Java0 2 00
chenqin/transformer_user_action
Transformer-based Realtime User Action Model for Recommendation at Pinterest
Language:Python00
chenqin/xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
Language:C++0 2 00
chenqin/chenqin
Config files for my GitHub profile.
1 0
chenqin/datasketches-cpp
Core C++ Sketch Library
Language:C++1 0
chenqin/docker-hive
Docker image for Apache Hive Metastore
Language:Dockerfile0 0
chenqin/examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
Language:Python0 0
chenqin/FlexGen
Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput generation.
Language:Python0 0
chenqin/flink-cdc-connectors
CDC Connectors for Apache Flink®
Language:Java0 0
chenqin/flink-connector-hive
Apache flink
Language:Java0 0
chenqin/gluten
Gluten: Plugin to Double SparkSQL's Performance
Language:Scala0 0
chenqin/hive-metastore-docker
Example for article Running Spark 3 with standalone Hive Metastore 3.0
chenqin/librdkafka
The Apache Kafka C/C++ library
Language:C1 0
chenqin/llama
Inference code for LLaMA models
Language:Python0 0
chenqin/modern-cpp-kafka
Modern C++ based Kafka API
Language:C++1 0
chenqin/nebula
A distributed block-based data storage and compute engine
Language:C++1 0
chenqin/openssl-cmake
Build OpenSSL with CMake on MacOS, Win32, Win64 and cross compile for Android, IOS
Language:C0 0
chenqin/pytorch-cpp
C++ Implementation of PyTorch Tutorials for Everyone
Language:C++0 0
chenqin/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.
Language:Python0 0
chenqin/react-digraph
A library for creating directed graph editors
Language:JavaScript1 0
chenqin/scalding
A Scala API for Cascading
Language:Scala0 0
chenqin/spark
Apache Spark - A unified analytics engine for large-scale data processing
Language:Scala0 0
chenqin/spark-on-k8s-operator
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Language:Go
chenqin/SparkInternals
Notes talking about the design and implementation of Apache Spark
2 0
chenqin/substrait
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
Language:Python0 0
chenqin/velox
A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
Language:C++0 0