lcy-seso

We choose to go to the moon!

MSRAChina

lcy-seso's Stars

nlohmann/json
JSON for Modern C++
Language:C++43.8k 763 2.3k6.8k
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda24.8k 252 1412.8k
ml-explore/mlx
MLX: An array framework for Apple silicon
Language:C++18k 148 5891k
state-spaces/mamba
Mamba SSM architecture
Language:Python13.6k 99 5781.2k
ShiArthur03/ShiArthur03
Language:MATLAB10.3k 32 1.4k1.9k
gpu-mode/lectures
Material for gpu-mode lectures
Language:Jupyter Notebook3.3k 49 9334
johnma2006/mamba-minimal
Simple, minimal implementation of the Mamba SSM in one file of PyTorch.
Language:Python2.7k 24 28194
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Language:C++2.3k 27 29130
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
Language:Cuda1.8k 30 3184
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.6k 21 153162
jafioti/luminal
Deep learning at the speed of light.
Language:Rust1.5k 20 5189
sustcsonglin/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Language:Python1.5k 28 6771
test-time-training/ttt-lm-pytorch
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Language:Python1.1k 7 2864
eyalroz/cuda-api-wrappers
Thin, unified, C++-flavored wrappers for the CUDA APIs
Language:C++804 30 63280
kendryte/nncase
Open deep learning compiler stack for Kendryte AI accelerators ✨
Language:C#756 29 380185
lucidrains/linear-attention-transformer
Transformer based on a variant of attention that is linear complexity in respect to sequence length
Language:Python714 13 2068
tspeterkim/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
Language:Cuda664 4 658
hidet-org/hidet
An open-source efficient deep learning framework/compiler, written in python.
Language:Python662 19 8453
zhuzilin/ring-flash-attention
Ring attention implementation with flash attention
Language:Python617 12 3852
radarFudan/Awesome-state-space-models
Collection of papers on state-space models
563 17 520
microsoft/VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
Language:Python545 15 4537
OpenNLPLab/lightning-attention
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Language:Python189 11 1215
proger/accelerated-scan
Accelerated First Order Parallel Associative Scan
Language:Python166 8 88
matazure/mtensor
a c++/cuda template library for tensor lazy evaluation
Language:C++163 16 2738
TorchMoE/MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
Language:Python108 7 148
notarussianteenager/srf-attention
Simplex Random Feature attention, in PyTorch
Language:Python71 1 13
LouisBavoil/ThreadGroupIDSwizzling
HLSL code for https://developer.nvidia.com/blog/optimizing-compute-shaders-for-l2-locality-using-thread-group-id-swizzling/
Language:HLSL61 2 18
zeroine/cutlass-cute-sample
Language:C++22 2 06
proger/nanokitchen
Parallel Associative Scan for Language Models
Language:Python18 3 01
fattorib/flashy_linear_attention
Flash linear attention kernels in Triton
Language:Python1 2 0

lcy-seso

lcy-seso's Stars

nlohmann/json

karpathy/llm.c

ml-explore/mlx

state-spaces/mamba

ShiArthur03/ShiArthur03

gpu-mode/lectures

johnma2006/mamba-minimal

kvcache-ai/Mooncake

HazyResearch/ThunderKittens

flashinfer-ai/flashinfer

jafioti/luminal

sustcsonglin/flash-linear-attention

test-time-training/ttt-lm-pytorch

eyalroz/cuda-api-wrappers

kendryte/nncase

lucidrains/linear-attention-transformer

tspeterkim/flash-attention-minimal

hidet-org/hidet

zhuzilin/ring-flash-attention

radarFudan/Awesome-state-space-models

microsoft/VPTQ

OpenNLPLab/lightning-attention

proger/accelerated-scan

matazure/mtensor

TorchMoE/MoE-Infinity

notarussianteenager/srf-attention

LouisBavoil/ThreadGroupIDSwizzling

zeroine/cutlass-cute-sample

proger/nanokitchen

fattorib/flashy_linear_attention