Fangtangtang's Stars
DefTruth/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
DarkSharpness/REIMU
A user-mode RISC-V simulator for education purpose.
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
llvm/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
apache/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Conless/CachedLLM
CachedLLM: efficient LLM serving system with dynamic page cache. Course project of Machine Learning (CS3308@SJTU).
BBuf/tvm_mlir_learn
compiler learning resources collect.
Engineev/ravel
A RISC-V simulator