Hongtao-Xu's Stars
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
songquanpeng/one-api
OpenAI 接口管理 & 分发系统,支持 Azure、Anthropic Claude、Google PaLM 2 & Gemini、智谱 ChatGLM、百度文心一言、讯飞星火认知、阿里通义千问、360 智脑以及腾讯混元,可用于二次分发管理 key,仅单可执行文件,已打包好 Docker 镜像,一键部署,开箱即用. OpenAI key management & redistribution system, using a single API for all LLMs, and features an English UI.
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
chenzomi12/AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
yuanzhoulvpi2017/zero_nlp
中文nlp解决方案(大模型、数据、模型、训练、推理)
Tony-Tan/CUDA_Freshman
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
siliconflow/onediff
OneDiff: An out-of-the-box acceleration library for diffusion models.
chengzeyi/stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
xdit-project/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
tspeterkim/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
sayakpaul/diffusers-torchao
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
66RING/tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass
ifromeast/cuda_learning
learning how CUDA works
HolyWu/vs-rife
RIFE function for VapourSynth
reed-lau/cute-gemm
luliyucoordinate/cute-flash-attention
Implement Flash Attention using Cute.
AdvancedCompiler/AdvancedCompiler
先进编译实验室的个人主页
luliyucoordinate/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
programming-cat-plus/learn-cpp-together
《一起来学C++》例程