DefTruth

@xlite-dev, @vipshopGuangzhou, China

Pinned Repositories

Awesome-LLM-Inference
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉
13 1 01
BlogLearning
自己的学习历程，重点包括各种好玩的图像处理算法、运动捕捉、机器学习
Language:Jupyter Notebook9 1 03
CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Language:Cuda43 0 06
lite.ai.toolkit
🛠 A lite C++ toolkit: contains 100+ Awesome AI models, support MNN, NCNN, TNN, ONNXRuntime and TensorRT. 🎉🎉
Language:C++19 0 00
FastDeploy
High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle
Language:Python3.5k 54 1.3k623
cache-dit
A Unified, Flexible and Training-free Cache Acceleration Framework for 🤗Diffusers.
Language:Python325 5 2410
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python58.6k 438 10.9k10.3k
Awesome-LLM-Inference
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Language:Python4.5k 132 8307
LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Language:Cuda7.6k 42 61753
lite.ai.toolkit
🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
Language:C++4.2k 68 277758

DefTruth's Repositories

DefTruth/PaddleOCR
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Language:Python5 1 0
DefTruth/ComputeLibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
Language:C++4 1 0
DefTruth/YOLOv6
YOLOv6: a single-stage object detection framework dedicated to industrial applications.
Language:Jupyter Notebook4 1 01
DefTruth/pybind11-Chinese-docs
pybind11中文文档（个人翻译）
3 1 01
DefTruth/FlyCV
Language:C++2 2 0
DefTruth/PL-Compiler-Resource
程序语言与编译技术相关资料（持续更新中）
2 1 0
DefTruth/simde
Implementations of SIMD instruction sets for systems which don't natively support them.
Language:C2 1 0
DefTruth/vscode-pdfviewer
Show PDF preview in VSCode.
Language:JavaScript2 1 0
DefTruth/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Language:C2 1 0
DefTruth/FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Language:C++1 1 0
DefTruth/gemm-sve2
row-major matmul optimization
Language:C++1 1 0
DefTruth/opencv-mobile
The minimal opencv for Android, iOS, ARM Linux, Windows, Linux, MacOS, WebAssembly
Language:C1 1 0
DefTruth/PaddleClas
A treasure chest for visual classification and recognition powered by PaddlePaddle
Language:Python1 1 0
DefTruth/PaddleDetection
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Language:Python1 1 0
DefTruth/PaddleNLP
👑 Easy-to-use and powerful NLP library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis and 🖼 Diffusion AIGC system etc.
Language:Python1 1 0
DefTruth/PaddleSeg
Easy-to-use image segmentation library with awesome pre-trained model zoo, supporting wide-range of practical tasks in Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image Matting, 3D Segmentation, etc.
Language:Python1 1 01
DefTruth/sse2neon
A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation
Language:C++1 1 0
DefTruth/tflite-micro
Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).
Language:C++1 1 0
DefTruth/ARM_NEON_2_x86_SSE
The platform independent header allowing to compile any C/C++ code containing ARM NEON intrinsic functions for x86 target systems using SIMD up to SSE4 intrinsic functions
Language:C1 0
DefTruth/awesome-distributed-systems
A curated list of awesome distributed systems books, papers, resources and shiny things.
1 0
DefTruth/CUDA-Programs
Examples from Programming in Parallel with CUDA
Language:Cuda1 0
DefTruth/cuda-tensorcore-hgemm
Language:Cuda0 0
DefTruth/gemmlowp
Low-precision matrix multiplication
Language:C++1 0
DefTruth/jni-bind
JNI Bind is a set of advanced syntactic sugar for writing efficient correct JNI Code in C++17 (and up).
Language:C++1 0
DefTruth/onnxruntime-extensions
The pre- and post- processing library for ONNX Runtime
Language:C++1 01
DefTruth/Paddle-Lite-Demo
lib, demo, model, data
Language:C++1 0
DefTruth/Pytorch_Retinaface
Retinaface get 80.99% in widerface hard val using mobilenet0.25.
Language:Python1 0
DefTruth/smoothquant
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Language:Python1 0
DefTruth/tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
Language:C++1 0
DefTruth/YOLOU
YOLOv3、YOLOv4、YOLOv5、YOLOv5-Lite、YOLOv6、YOLOv7、YOLOX、YOLOX-Lite、TensorRT、NCNN、Tengine、OpenVINO
Language:Python1 0