Fangtangtang

Fangtangtang's Stars

llvm/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Language:LLVM30k 582 79.2k12.4k
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python14.9k 123 1.2k1.4k
apache/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Language:Python11.9k 379 3.4k3.5k
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++5.9k 110 1.2k1k
BBuf/tvm_mlir_learn
compiler learning resources collect.
Language:Python2.2k 36 4340
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
Language:Cuda1.9k 32 3290
DefTruth/CUDA-Learn-Notes
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Language:Cuda1.9k 14 9195
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.7k 21 160169
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Language:Python1.1k 15 4747
Engineev/ravel
A RISC-V simulator
Language:C++36 5 08
DarkSharpness/REIMU
A user-mode RISC-V simulator for education purpose.
Language:C++9 1 43
Conless/CachedLLM
CachedLLM: efficient LLM serving system with dynamic page cache. Course project of Machine Learning (CS3308@SJTU).
3 1 00