DefTruth
⚙️Now. @vipshop | Owner. @xlite-dev | 🤖Prev. @PaddlePaddle | 📚LeetCUDA | 🤗cache-dit | lite.ai.tookit | 🤖ffpa-attn
@xlite-dev, @vipshopGuangzhou, China
Pinned Repositories
Awesome-LLM-Inference
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉
BlogLearning
自己的学习历程,重点包括各种好玩的图像处理算法、运动捕捉、机器学习
CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
lite.ai.toolkit
🛠 A lite C++ toolkit: contains 100+ Awesome AI models, support MNN, NCNN, TNN, ONNXRuntime and TensorRT. 🎉🎉
FastDeploy
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
cache-dit
A Unified, Flexible and Training-free Cache Acceleration Framework for 🤗Diffusers.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Awesome-LLM-Inference
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
lite.ai.toolkit
🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
DefTruth's Repositories
DefTruth/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
DefTruth/ComputeLibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
DefTruth/YOLOv6
YOLOv6: a single-stage object detection framework dedicated to industrial applications.
DefTruth/pybind11-Chinese-docs
pybind11中文文档(个人翻译)
DefTruth/FlyCV
DefTruth/PL-Compiler-Resource
程序语言与编译技术相关资料(持续更新中)
DefTruth/simde
Implementations of SIMD instruction sets for systems which don't natively support them.
DefTruth/vscode-pdfviewer
Show PDF preview in VSCode.
DefTruth/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
DefTruth/FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
DefTruth/gemm-sve2
row-major matmul optimization
DefTruth/opencv-mobile
The minimal opencv for Android, iOS, ARM Linux, Windows, Linux, MacOS, WebAssembly
DefTruth/PaddleClas
A treasure chest for visual classification and recognition powered by PaddlePaddle
DefTruth/PaddleDetection
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
DefTruth/PaddleNLP
👑 Easy-to-use and powerful NLP library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis and 🖼 Diffusion AIGC system etc.
DefTruth/PaddleSeg
Easy-to-use image segmentation library with awesome pre-trained model zoo, supporting wide-range of practical tasks in Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image Matting, 3D Segmentation, etc.
DefTruth/sse2neon
A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation
DefTruth/tflite-micro
Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).
DefTruth/ARM_NEON_2_x86_SSE
The platform independent header allowing to compile any C/C++ code containing ARM NEON intrinsic functions for x86 target systems using SIMD up to SSE4 intrinsic functions
DefTruth/awesome-distributed-systems
A curated list of awesome distributed systems books, papers, resources and shiny things.
DefTruth/CUDA-Programs
Examples from Programming in Parallel with CUDA
DefTruth/cuda-tensorcore-hgemm
DefTruth/gemmlowp
Low-precision matrix multiplication
DefTruth/jni-bind
JNI Bind is a set of advanced syntactic sugar for writing efficient correct JNI Code in C++17 (and up).
DefTruth/onnxruntime-extensions
The pre- and post- processing library for ONNX Runtime
DefTruth/Paddle-Lite-Demo
lib, demo, model, data
DefTruth/Pytorch_Retinaface
Retinaface get 80.99% in widerface hard val using mobilenet0.25.
DefTruth/smoothquant
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
DefTruth/tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
DefTruth/YOLOU
YOLOv3、YOLOv4、YOLOv5、YOLOv5-Lite、YOLOv6、YOLOv7、YOLOX、YOLOX-Lite、TensorRT、NCNN、Tengine、OpenVINO