lcy-seso's Stars
nlohmann/json
JSON for Modern C++
karpathy/llm.c
LLM training in simple, raw C/CUDA
ml-explore/mlx
MLX: An array framework for Apple silicon
state-spaces/mamba
Mamba SSM architecture
ShiArthur03/ShiArthur03
gpu-mode/lectures
Material for gpu-mode lectures
johnma2006/mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
jafioti/luminal
Deep learning at the speed of light.
sustcsonglin/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
test-time-training/ttt-lm-pytorch
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
eyalroz/cuda-api-wrappers
Thin, unified, C++-flavored wrappers for the CUDA APIs
kendryte/nncase
Open deep learning compiler stack for Kendryte AI accelerators ✨
lucidrains/linear-attention-transformer
Transformer based on a variant of attention that is linear complexity in respect to sequence length
tspeterkim/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
hidet-org/hidet
An open-source efficient deep learning framework/compiler, written in python.
zhuzilin/ring-flash-attention
Ring attention implementation with flash attention
radarFudan/Awesome-state-space-models
Collection of papers on state-space models
microsoft/VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
OpenNLPLab/lightning-attention
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
proger/accelerated-scan
Accelerated First Order Parallel Associative Scan
matazure/mtensor
a c++/cuda template library for tensor lazy evaluation
TorchMoE/MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
notarussianteenager/srf-attention
Simplex Random Feature attention, in PyTorch
LouisBavoil/ThreadGroupIDSwizzling
HLSL code for https://developer.nvidia.com/blog/optimizing-compute-shaders-for-l2-locality-using-thread-group-id-swizzling/
zeroine/cutlass-cute-sample
proger/nanokitchen
Parallel Associative Scan for Language Models
fattorib/flashy_linear_attention
Flash linear attention kernels in Triton