chu-tianxiang

AlibabaChina

chu-tianxiang's Stars

xai-org/grok-1
Grok open release
Language:Python49.5k 562 2098.3k
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
Language:Python22.1k 186 4902.2k
LargeWorldModel/LWM
Large World Model With 1M Context
Language:Python7.1k 66 71551
HVision-NKU/StoryDiffusion
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
Language:Jupyter Notebook5.9k 86 143593
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.6k 61 104512
meta-llama/llama-agentic-system
Agentic components of the Llama Stack APIs
Language:Python3.2k 38 35308
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
2.7k 90 6185
pytorch/torchtitan
A native PyTorch Library for large model training
Language:Python2.5k 39 165192
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Language:Python2.2k 34 242142
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
Language:Python1.6k 21 8886
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Language:Cuda1.5k 24 9128
noamgat/lm-format-enforcer
Enforce the output format (JSON Schema, Regex etc) of a language model
Language:Python1.5k 13 11267
deepseek-ai/DeepSeek-LLM
DeepSeek LLM: Let there be answers
Language:Makefile1.4k 24 3293
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.4k 16 114123
NVIDIA/cccl
CUDA Core Compute Libraries
Language:C++1.2k 30 1.4k156
Vahe1994/AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression https://arxiv.org/abs/2405.14852
Language:Python1.2k 19 85175
google-research/deduplicate-text-datasets
Language:Rust1.1k 13 41111
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
1.1k 12 423
zhuzilin/ring-flash-attention
Ring attention implementation with flash attention
Language:Python570 10 3445
Cornell-RelaxML/quip-sharp
Language:Python499 12 6244
mit-han-lab/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Language:Python421 9 3020
LLMServe/DistServe
Disaggregated serving system for Large Language Models (LLMs).
Language:Jupyter Notebook333 4 4135
Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Language:Cuda283 4 1265
KnowingNothing/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
Language:C++283 8 1130
efeslab/Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Language:Cuda274 10 1924
spcl/QuaRot
Code for QuaRot, an end-to-end 4-bit inference of large language models.
Language:Python270 11 4220
AlibabaResearch/flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Language:Cuda176 5 715
AILab-CVC/VL-GPT
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
84 19 22
chu-tianxiang/llama-cpp-torch
llama.cpp to PyTorch Converter
Language:Cuda26 1 25
Superjomn/cuda-from-scratch
Language:Cuda2 3 00

chu-tianxiang

chu-tianxiang's Stars

xai-org/grok-1

hpcaitech/Open-Sora

LargeWorldModel/LWM

HVision-NKU/StoryDiffusion

pytorch-labs/gpt-fast

meta-llama/llama-agentic-system

DefTruth/Awesome-LLM-Inference

pytorch/torchtitan

predibase/lorax

baaivision/Emu

BBuf/how-to-optim-algorithm-in-cuda

noamgat/lm-format-enforcer

deepseek-ai/DeepSeek-LLM

flashinfer-ai/flashinfer

NVIDIA/cccl

Vahe1994/AQLM

google-research/deduplicate-text-datasets

kvcache-ai/Mooncake

zhuzilin/ring-flash-attention

Cornell-RelaxML/quip-sharp

mit-han-lab/qserve

LLMServe/DistServe

Bruce-Lee-LY/cuda_hgemm

KnowingNothing/MatmulTutorial

efeslab/Atom

spcl/QuaRot

AlibabaResearch/flash-llm

AILab-CVC/VL-GPT

chu-tianxiang/llama-cpp-torch

Superjomn/cuda-from-scratch