Hongtao-Xu

Hongtao-Xu's Stars

pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python85.7k 1.8k 48k23.1k
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python33.5k 275 5.9k5.1k
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
Language:Python27.1k 211 4.4k5.6k
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
Language:Python23k 192 5322.3k
songquanpeng/one-api
OpenAI 接口管理 & 分发系统，支持 Azure、Anthropic Claude、Google PaLM 2 & Gemini、智谱 ChatGLM、百度文心一言、讯飞星火认知、阿里通义千问、360 智脑以及腾讯混元，可用于二次分发管理 key，仅单可执行文件，已打包好 Docker 镜像，一键部署，开箱即用. OpenAI key management & redistribution system, using a single API for all LLMs, and features an English UI.
Language:JavaScript20.7k 111 1.5k4.5k
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Language:Python16.9k 111 1.1k1.7k
chenzomi12/AISystem
AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Language:Jupyter Notebook11.9k 154 421.7k
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Language:Python11.9k 154 3681k
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Language:Python6.7k 44 83597
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:Python5.1k 41 1.6k456
yuanzhoulvpi2017/zero_nlp
中文nlp解决方案(大模型、数据、模型、训练、推理)
Language:Jupyter Notebook3.1k 30 201379
Tony-Tan/CUDA_Freshman
Language:Cuda2.3k 10 14446
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Language:Cuda1.8k 26 10150
siliconflow/onediff
OneDiff: An out-of-the-box acceleration library for diffusion models.
Language:Jupyter Notebook1.8k 39 466111
chengzeyi/stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Language:Python1.2k 17 12777
xdit-project/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Language:Python1.1k 35 126106
tspeterkim/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
Language:Cuda678 4 658
sayakpaul/diffusers-torchao
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
Language:Python300 10 249
66RING/tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass
Language:Cuda242 3 921
ifromeast/cuda_learning
learning how CUDA works
Language:Cuda186 4 324
HolyWu/vs-rife
RIFE function for VapourSynth
Language:Python110 2 449
reed-lau/cute-gemm
Language:C++94 2 526
luliyucoordinate/cute-flash-attention
Implement Flash Attention using Cute.
Language:Cuda65 1 12
AdvancedCompiler/AdvancedCompiler
先进编译实验室的个人主页
Language:C++35 1 03
luliyucoordinate/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
Language:Cuda9 0 00
programming-cat-plus/learn-cpp-together
《一起来学C++》例程
Language:C++6