Pinned Repositories
attention_learning
cutlass
CUDA Templates for Linear Algebra Subroutines
flash_attention_inference
Performance of the C++ interface of flash attention, flash attention v2 and self quantized decoding attention in large language model (LLM) inference scenarios.
flashinfer
FlashInfer: Kernel Library for LLM Serving
LookaheadDecoding
MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
minitf
Simplified version of Tensorflow for learning purposes.
ScaleLLM
A high-performance inference system for large language models, designed for production environments.
LLMBench
A library for validating and benchmarking LLMs inference.
ScaleLLM
A high-performance inference system for large language models, designed for production environments.
guocuimi's Repositories
guocuimi/minitf
Simplified version of Tensorflow for learning purposes.
guocuimi/attention_learning
guocuimi/ScaleLLM
A high-performance inference system for large language models, designed for production environments.
guocuimi/cutlass
CUDA Templates for Linear Algebra Subroutines
guocuimi/flash_attention_inference
Performance of the C++ interface of flash attention, flash attention v2 and self quantized decoding attention in large language model (LLM) inference scenarios.
guocuimi/flashinfer
FlashInfer: Kernel Library for LLM Serving
guocuimi/LookaheadDecoding
guocuimi/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial