luliyucoordinate

Pytorch/TensorFlow/CUDA/HPC/more

hangzhou

luliyucoordinate's Stars

NX-AI/flashrnn
FlashRNN - Fast RNN Kernels with I/O Awareness
Language:Python59
microsoft/FractalTensor
Language:Python133
andrewkchan/yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
Language:C++16712
AIDC-AI/Marco-o1
An Open Large Reasoning Model for Real-World Solutions
Language:Python1.2k62
howardlau1999/rdmapp
C++ interfaces for RDMA access
Language:C++624
microsoft/TileFusion
Language:C++315
zhihu/ZhiLight
A highly optimized LLM inference acceleration engine for Llama and its variants.
Language:C++39330
KONAKONA666/q8_kernels
Language:Cuda403
Tencent/HunyuanVideo
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Language:Python6.5k486
DefTruth/hgemm-tensorcores-mma
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA PTX and CuTe API (Write for Fun 👀~)
Language:Cuda37
lllyasviel/IC-Light
More relighting!
Language:Python7.1k413
NVIDIA/Star-Attention
Efficient LLM Inference over Long Sequences
Language:Python32314
mlc-ai/xgrammar
Efficient, Flexible and Portable Structured Generation
Language:C++49924
facebookexperimental/triton
Github mirror of trition-lang/triton repo.
Language:C++144
CalebDu/Awesome-Cute
Language:C++222
cchan/tccl
extensible collectives library in triton
Language:Python754
mirage-project/mirage
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
Language:C++68439
mit-han-lab/nunchaku
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Language:Cuda53727
chengzeyi/ParaAttention
[WIP] Context parallel attention that works with torch.compile
Language:Python625
mlc-ai/tokenizers-cpp
Universal cross-platform tokenizers binding to HF and sentencepiece
Language:C++28665
NVlabs/COAT
Language:Python451
feifeibear/ChituAttention
Quantized Attention on GPU
Language:Python33
LeiWang1999/Stream-k.tvm
Language:Python181
bytedance/ShadowKV
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Language:Python1406
NVIDIA/cccl
CUDA Core Compute Libraries
Language:C++1.3k170
DD-DuDa/Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
Language:Makefile12013
KuangjuX/PyKernelCollection
Collection of algorithms implemented using PyTorch and Triton.
Language:Python5
INT-FlashAttention2024/INT-FlashAttention
Language:Python533
yangjianxin1/Firefly
Firefly: 大模型训练工具，支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
Language:Python6k532
ruikangliu/FlatQuant
Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization
Language:Python817

luliyucoordinate

luliyucoordinate's Stars

NX-AI/flashrnn

microsoft/FractalTensor

andrewkchan/yalm

AIDC-AI/Marco-o1

howardlau1999/rdmapp

microsoft/TileFusion

zhihu/ZhiLight

KONAKONA666/q8_kernels

Tencent/HunyuanVideo

DefTruth/hgemm-tensorcores-mma

lllyasviel/IC-Light

NVIDIA/Star-Attention

mlc-ai/xgrammar

facebookexperimental/triton

CalebDu/Awesome-Cute

cchan/tccl

mirage-project/mirage

mit-han-lab/nunchaku

chengzeyi/ParaAttention

mlc-ai/tokenizers-cpp

NVlabs/COAT

feifeibear/ChituAttention

LeiWang1999/Stream-k.tvm

bytedance/ShadowKV

NVIDIA/cccl

DD-DuDa/Cute-Learning

KuangjuX/PyKernelCollection

INT-FlashAttention2024/INT-FlashAttention

yangjianxin1/Firefly

ruikangliu/FlatQuant