luliyucoordinate's Stars
NX-AI/flashrnn
FlashRNN - Fast RNN Kernels with I/O Awareness
microsoft/FractalTensor
andrewkchan/yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
AIDC-AI/Marco-o1
An Open Large Reasoning Model for Real-World Solutions
howardlau1999/rdmapp
C++ interfaces for RDMA access
microsoft/TileFusion
zhihu/ZhiLight
A highly optimized LLM inference acceleration engine for Llama and its variants.
KONAKONA666/q8_kernels
Tencent/HunyuanVideo
HunyuanVideo: A Systematic Framework For Large Video Generation Model
DefTruth/hgemm-tensorcores-mma
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API (Write for Fun 👀~)
lllyasviel/IC-Light
More relighting!
NVIDIA/Star-Attention
Efficient LLM Inference over Long Sequences
mlc-ai/xgrammar
Efficient, Flexible and Portable Structured Generation
facebookexperimental/triton
Github mirror of trition-lang/triton repo.
CalebDu/Awesome-Cute
cchan/tccl
extensible collectives library in triton
mirage-project/mirage
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
mit-han-lab/nunchaku
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
chengzeyi/ParaAttention
[WIP] Context parallel attention that works with torch.compile
mlc-ai/tokenizers-cpp
Universal cross-platform tokenizers binding to HF and sentencepiece
NVlabs/COAT
feifeibear/ChituAttention
Quantized Attention on GPU
LeiWang1999/Stream-k.tvm
bytedance/ShadowKV
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
NVIDIA/cccl
CUDA Core Compute Libraries
DD-DuDa/Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
KuangjuX/PyKernelCollection
Collection of algorithms implemented using PyTorch and Triton.
INT-FlashAttention2024/INT-FlashAttention
yangjianxin1/Firefly
Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
ruikangliu/FlatQuant
Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization