JackWeiw's Stars
chenzomi12/AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
DeepLink-org/deeplink.framework
huggingface/optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
yinuotxie/Efficient-LLM-Inferencing-on-GPUs
Penn CIS 5650 (GPU Programming and Architecture) Final Project
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
leimao/CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Yinghan-Li/YHs_Sample
Yinghan's Code Sample
merrymercy/awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
l1nkr/DL-Compiler-Navigation
Machine Learning Compiler Road Map