Tengxu-Sun's Stars
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
NUS-HPC-AI-Lab/VideoSys
VideoSys: An easy and efficient system for video generation
ggerganov/whisper.cpp
Port of OpenAI's Whisper model in C/C++
ggerganov/ggml
Tensor library for machine learning
dingyuqing05/trt2022_wenet
efeslab/Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
NVIDIA-developer-blog/code-samples
Source code examples from the Parallel Forall Blog
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
zenny-chen/GPU-architectures-docs-and-demos
各大GPU厂商以及平台商关于3D图形渲染的demo
GrowingGit/GitHub-English-Top-Charts
Help you discover excellent English projects and get rid of disturbing by other spoken language.
GrowingGit/GitHub-Chinese-Top-Charts
:cn: GitHub中文排行榜,各语言分设「软件 | 资料」榜单,精准定位中文好项目。各取所需,高效学习。
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
feifeibear/long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
xdit-project/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
zhuohan123/terapipe
koalaman/shellcheck
ShellCheck, a static analysis tool for shell scripts
huggingface/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
THUDM/SwissArmyTransformer
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
nicolaswilde/cuda-tensorcore-hgemm
KnowingNothing/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
wzsh/wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
chuanyangjin/fast-DiT
Fast Diffusion Models with Transformers
IST-DASLab/QUIK
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference
NVIDIA/cccl
CUDA Core Compute Libraries