xlite-dev

Develop ML/AI toolkits and ML/AI/CUDA Learning resources.

China

Pinned Repositories

Awesome-DiT-Inference
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
Language:Python438 15 021
Awesome-LLM-Inference
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Language:Python4.7k 133 8321
ffpa-attn
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
Language:Cuda227 3 1010
HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
Language:Cuda124 2 17
LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Language:Cuda8.4k 43 68831
lihang-notes
📚《统计学习方法-李航: 笔记》 200页PDF，公式细节讲解🎉
Language:Shell486 4 257
lite.ai.toolkit
🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
Language:C++4.3k 67 277764
RVM-Inference
🔥Robust Video Matting C++ inference toolkit with ONNXRuntime、MNN、NCNN and TNN, via lite.ai.toolkit.
Language:C++135 2 3827
torchlm
💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.🎉
Language:Python266 6 2627
yolov5face-toolkit
YOLO5Face 2021 with MNN/NCNN/TNN/ONNXRuntime
Language:C++61 1 78

xlite-dev's Repositories

xlite-dev/LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Language:Cuda8.4k 43 68831
xlite-dev/Awesome-LLM-Inference
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Language:Python4.7k 133 8321
xlite-dev/lite.ai.toolkit
🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
Language:C++4.3k 67 277764
xlite-dev/lihang-notes
📚《统计学习方法-李航: 笔记》 200页PDF，公式细节讲解🎉
Language:Shell486 4 257
xlite-dev/Awesome-DiT-Inference
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
Language:Python438 15 021
xlite-dev/torchlm
💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.🎉
Language:Python266 6 2627
xlite-dev/ffpa-attn
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
Language:Cuda227 3 1010
xlite-dev/RVM-Inference
🔥Robust Video Matting C++ inference toolkit with ONNXRuntime、MNN、NCNN and TNN, via lite.ai.toolkit.
Language:C++135 2 3827
xlite-dev/HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
Language:Cuda124 2 17
xlite-dev/flux-faster
A forked version of flux-fast that makes flux-fast even faster with cache-dit, 3.3x speedup on NVIDIA L20.
Language:Python24
xlite-dev/netron-vscode-extension
☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.
Language:TypeScript14 1 3
xlite-dev/SpargeAttn
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
Language:Cuda6 0 0
xlite-dev/comfyui-cache-dit
cache-dit for comfyui
Language:Python5
xlite-dev/flux
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Language:C++5 0 0
xlite-dev/flux-fast
A forked version of flux-fast that makes flux-fast even faster with cache-dit.
Language:Python5
xlite-dev/cache-dit
A Unified, Flexible and Training-free Cache Acceleration Framework for 🤗 Diffusers.
Language:Python4 0 0
xlite-dev/nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
2
xlite-dev/.github
1 1 0
xlite-dev/draft-attention
Code for Draft Attention
1
xlite-dev/HunyuanImage-2.1
HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation
Language:Python1
xlite-dev/Qwen-Image
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Language:Python1
xlite-dev/xlite-cli
The cli version of lite.ai.toolkit
Language:C++1 1 0
xlite-dev/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
Language:Python0 0
xlite-dev/FlagGems
FlagGems is an operator library for large language models implemented in the Triton Language.
Language:Python0 0
xlite-dev/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1 0
xlite-dev/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python0 0
xlite-dev/Qwen-Image-Lightning
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
Language:Python0 0
xlite-dev/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Language:Cuda0 0
xlite-dev/tutorial-template
Template for the Read the Docs tutorial
xlite-dev/Wan2.2
Wan: Open and Advanced Large-Scale Video Generative Models
Language:Python