Fangtangtang's Stars
llvm/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
apache/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
BBuf/tvm_mlir_learn
compiler learning resources collect.
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
DefTruth/CUDA-Learn-Notes
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Engineev/ravel
A RISC-V simulator
DarkSharpness/REIMU
A user-mode RISC-V simulator for education purpose.
Conless/CachedLLM
CachedLLM: efficient LLM serving system with dynamic page cache. Course project of Machine Learning (CS3308@SJTU).