Pinned Repositories
Awesome-DiT-Inference
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
Awesome-LLM-Inference
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
ffpa-attn
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
lihang-notes
📚《统计学习方法-李航: 笔记》 200页PDF,公式细节讲解🎉
lite.ai.toolkit
🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
RVM-Inference
🔥Robust Video Matting C++ inference toolkit with ONNXRuntime、MNN、NCNN and TNN, via lite.ai.toolkit.
torchlm
💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.🎉
yolov5face-toolkit
YOLO5Face 2021 with MNN/NCNN/TNN/ONNXRuntime
xlite-dev's Repositories
xlite-dev/LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
xlite-dev/Awesome-LLM-Inference
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
xlite-dev/lite.ai.toolkit
🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
xlite-dev/lihang-notes
📚《统计学习方法-李航: 笔记》 200页PDF,公式细节讲解🎉
xlite-dev/Awesome-DiT-Inference
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
xlite-dev/torchlm
💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.🎉
xlite-dev/ffpa-attn
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
xlite-dev/RVM-Inference
🔥Robust Video Matting C++ inference toolkit with ONNXRuntime、MNN、NCNN and TNN, via lite.ai.toolkit.
xlite-dev/HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
xlite-dev/flux-faster
A forked version of flux-fast that makes flux-fast even faster with cache-dit, 3.3x speedup on NVIDIA L20.
xlite-dev/netron-vscode-extension
☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.
xlite-dev/SpargeAttn
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
xlite-dev/comfyui-cache-dit
cache-dit for comfyui
xlite-dev/flux
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
xlite-dev/flux-fast
A forked version of flux-fast that makes flux-fast even faster with cache-dit.
xlite-dev/cache-dit
A Unified, Flexible and Training-free Cache Acceleration Framework for 🤗 Diffusers.
xlite-dev/nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
xlite-dev/.github
xlite-dev/draft-attention
Code for Draft Attention
xlite-dev/HunyuanImage-2.1
HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation
xlite-dev/Qwen-Image
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
xlite-dev/xlite-cli
The cli version of lite.ai.toolkit
xlite-dev/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
xlite-dev/FlagGems
FlagGems is an operator library for large language models implemented in the Triton Language.
xlite-dev/flashinfer
FlashInfer: Kernel Library for LLM Serving
xlite-dev/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
xlite-dev/Qwen-Image-Lightning
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
xlite-dev/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
xlite-dev/tutorial-template
Template for the Read the Docs tutorial
xlite-dev/Wan2.2
Wan: Open and Advanced Large-Scale Video Generative Models