Pinned Repositories
Awesome-Diffusion-Inference
📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉
Awesome-LLM-Inference
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
CUDA-Learn-Notes
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
ffpa-attn-mma
📚FFPA(Split-D): Yet another Faster Flash Attention with O(1) GPU SRAM complexity large headdim, 1.8x~3x↑🎉 faster than SDPA EA.
hgemm-tensorcores-mma
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
lite.ai.toolkit
🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
RVM-Inference
🔥Robust Video Matting C++ inference toolkit with ONNXRuntime、MNN、NCNN and TNN, via lite.ai.toolkit.
statistic-learning-R-note
📒统计学习方法-李航: 笔记-从原理到实现, 200-page PDF Notes with detailed explanations of various math formulas, implemented in R.🎉
torchlm
💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.🎉
yolov5face-toolkit
YOLO5Face 2021 with MNN/NCNN/TNN/ONNXRuntime
xlite-dev's Repositories
xlite-dev/Awesome-LLM-Inference
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
xlite-dev/lite.ai.toolkit
🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
xlite-dev/CUDA-Learn-Notes
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
xlite-dev/statistic-learning-R-note
📒统计学习方法-李航: 笔记-从原理到实现, 200-page PDF Notes with detailed explanations of various math formulas, implemented in R.🎉
xlite-dev/torchlm
💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.🎉
xlite-dev/Awesome-Diffusion-Inference
📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉
xlite-dev/ffpa-attn-mma
📚FFPA(Split-D): Yet another Faster Flash Attention with O(1) GPU SRAM complexity large headdim, 1.8x~3x↑🎉 faster than SDPA EA.
xlite-dev/RVM-Inference
🔥Robust Video Matting C++ inference toolkit with ONNXRuntime、MNN、NCNN and TNN, via lite.ai.toolkit.
xlite-dev/hgemm-tensorcores-mma
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
xlite-dev/yolov5face-toolkit
YOLO5Face 2021 with MNN/NCNN/TNN/ONNXRuntime
xlite-dev/nanodet-toolkit
NanoDet、NanoDet-Plus with ONNXRuntime/MNN/TNN/NCNN C++.
xlite-dev/fsanet-toolkit
🍅🍅FSANet: 1 Mb!! Head Pose Estimation with MNN、TNN and ONNXRuntime C++.
xlite-dev/scrfd-toolkit
🍅🍅 Super fast accurate face detector ! SCRFD(CVPR 2021) with MNN/TNN/NCNN/ONNXRuntime C++.
xlite-dev/netron-vscode-extension
☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.
xlite-dev/yolox-toolkit
YOLOX with NCNN/MNN/TNN/ONNXRuntime C++.
xlite-dev/yolop-toolkit
🚀🚀🌟 YOLOP with ONNXRuntime C++/MNN/TNN/NCNN
xlite-dev/mgmatting-toolkit
🍅MGMatting with MNN/TNN/ONNXRuntime C++, GPU/CPU, support dynamic shape.
xlite-dev/ssrnet-toolkit
🍅🍅 SSRNet: 190 Kb!! Super fast Age Estimation with MNN/TNN/ONNXRuntime C++.
xlite-dev/.github
xlite-dev/xlite-cli
The cli version of lite.ai.toolkit
xlite-dev/flashinfer
FlashInfer: Kernel Library for LLM Serving
xlite-dev/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.