Pinned Repositories
Spark-PMoF
Spark Shuffle Optimization with RDMA+AEP
gazelle_plugin
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
bitcask
Simple KV based on bitcask
ceph
Ceph is a distributed object, block, and file storage platform
cosbench-kits
fio
Flexible I/O Tester
HDCS
Hyper-converged Distributed Cache Store
native-sql-engine
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
tongjithesis
TongjiThesis is the abbreviation of Tongji University(P.R.C) Thesis LaTeX Template. This macro package aimed at creating a simple-to-use LaTeX dissertation template, including undergraduate thesis, master's thesis, doctoral dissertation.
zhouyuan's Repositories
zhouyuan/native-sql-engine
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
zhouyuan/AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
zhouyuan/arrow
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.
zhouyuan/arrow-datafusion
Apache Arrow DataFusion SQL Query Engine
zhouyuan/arrow-rs
Official Rust implementation of Apache Arrow
zhouyuan/client
Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
zhouyuan/dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
zhouyuan/extension-script
Example repository for custom C++/CUDA operators for TorchScript
zhouyuan/flashinfer
FlashInfer: Kernel Library for LLM Serving
zhouyuan/gluten
zhouyuan/gluten-it
Intergration testing for Gluten
zhouyuan/gluten-te
Portable test envrionment of Gluten
zhouyuan/Gluten-Trino
Gluten: Plugin to Boost Trino's Performance
zhouyuan/libgsasl
https://www.gnu.org/software/gsasl/
zhouyuan/libhdfs3
HDFS file read access for ClickHouse
zhouyuan/llama2.c
Inference Llama 2 in one file of pure C
zhouyuan/llm-continuous-batching-benchmarks
zhouyuan/LMCache
Prefill LLMs only once, re-use KV across instances
zhouyuan/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
zhouyuan/protobuf
Protocol Buffers - Google's data interchange format
zhouyuan/PyGithub
Typed interactions with the GitHub API v3
zhouyuan/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
zhouyuan/s3select
library for processing s3select queries and execute them on CSV files (current phase)
zhouyuan/spark
Apache Spark - A unified analytics engine for large-scale data processing
zhouyuan/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
zhouyuan/triton
Development repository for the Triton language and compiler
zhouyuan/velox
A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
zhouyuan/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
zhouyuan/x86-simd-sort
C++ header file library for high performance SIMD based sorting algorithms for primitive datatypes
zhouyuan/zhouyuan.github.io
zhouyuan.github.io