inference-engine

There are 282 repositories under inference-engine topic.

FedML-AI/FedML
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
Language:Python4k 91 330761
zjhellofss/KuiperInfer
校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
Language:C++3.2k 27 28349
hyperjumptech/grule-rule-engine
Rule engine implementation in Golang
Language:Go2.4k 60 188371
siliconflow/onediff
OneDiff: An out-of-the-box acceleration library for diffusion models.
Language:Jupyter Notebook1.9k 41 478125
aphrodite-engine/aphrodite-engine
Large-scale LLM inference engine
Language:C++1.6k 21 241176
Tencent/FeatherCNN
FeatherCNN is a high performance inference engine for convolutional neural networks.
Language:C++1.2k 98 44281
PaddlePaddle/Paddle.js
Paddle.js is a web project for Baidu PaddlePaddle, which is an open source deep learning framework running in the browser. Paddle.js can either load a pre-trained model, or transforming a model from paddle-hub with model transforming tools provided by Paddle.js. It could run in every browser with WebGL/WebGPU/WebAssembly supported. It could also run in Baidu Smartprogram and WX miniprogram.
Language:JavaScript1.1k 91 147146
zhihu/ZhiLight
A highly optimized LLM inference acceleration engine for Llama and its variants.
Language:C++903 52 17103
quic/ai-hub-models
The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
Language:Python830 22 234146
Adlik/Adlik
Adlik: Toolkit for Accelerating Deep Learning Inference
Language:C++806 27 18182
msnh2012/Msnhnet
🔥 (yolov3 yolov4 yolov5 unet ...)A mini pytorch inference framework which inspired from darknet.
Language:C++745 24 23147
insight-platform/Savant
Python Computer Vision & Video Analytics Framework With Batteries Included
Language:Python713 12 54566
jd-opensource/xllm
A high-performance inference engine for LLMs, optimized for diverse AI accelerators.
Language:C++677 13 5277
ovg-project/kvcached
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
Language:Python621 12 6357
pylint-dev/astroid
A common base representation of python source code for pylint and other projects
Language:Python561 23 864307
Tencent/Forward
A library for high performance deep learning inference on NVIDIA GPUs.
Language:C++557 21 2865
PaddlePaddle/Anakin
High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.
Language:C++534 55 87134
andrewkchan/yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
Language:C++528 8 150
HoloClean/holoclean
A Machine Learning System for Data Enrichment.
Language:Python528 29 33131
buguroo/pyknow
PyKnow: Expert Systems for Python
Language:Python490 38 48151
ulfurinn/wongi-engine
A rule engine written in Ruby.
Language:Ruby488 23 6440
zjhellofss/KuiperLLama
校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
Language:C++445 5 2117
chengzeyi/ParaAttention
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching
Language:Python386 9 3138
ReactiveBayes/RxInfer.jl
Julia package for automated Bayesian inference on a factor graph with reactive message passing
Language:Jupyter Notebook371 16 17433
quic/ai-hub-apps
The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
Language:Java329 11 9781
interestingLSY/swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
Language:Python286 3 635
EfficientMoE/MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
Language:Python257 6 3819
SearchSavior/OpenArc
Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.
Language:Python238 7 2010
gottingen/kumo-search
docs for search system and ai infra
219 12 022
ROCm/MIVisionX
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
Language:C++203 24 25483
BMW-InnovationLab/BMW-TensorFlow-Inference-API-CPU
This is a repository for an object detection inference API using the Tensorflow framework.
Language:Python183 13 247
nilp0inter/experta
Expert Systems for Python
Language:Python178 9 3445
midea-ai/Aidget
Ai edge toolbox，专门面向边端设备尤其是嵌入式RTOS平台，AI模型部署工具链，包括模型推理引擎和模型压缩工具
Language:Python164 12 630
tuanlda78202/gpt-oss-amd
implement GPT-OSS 20B & 120B C++ inference from scratch on AMD GPUs
Language:C++15612
matteocarnelos/microflow-rs
A robust and efficient TinyML inference engine.
Language:Rust154 2 414
CAS-CLab/CNN-Inference-Engine-Quick-View
A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices.
151 12 218

inference-engine

FedML-AI/FedML

zjhellofss/KuiperInfer

hyperjumptech/grule-rule-engine

siliconflow/onediff

aphrodite-engine/aphrodite-engine

Tencent/FeatherCNN

PaddlePaddle/Paddle.js

zhihu/ZhiLight

quic/ai-hub-models

Adlik/Adlik

msnh2012/Msnhnet

insight-platform/Savant

jd-opensource/xllm

ovg-project/kvcached

pylint-dev/astroid

Tencent/Forward

PaddlePaddle/Anakin

andrewkchan/yalm

HoloClean/holoclean

buguroo/pyknow

ulfurinn/wongi-engine

zjhellofss/KuiperLLama

chengzeyi/ParaAttention

ReactiveBayes/RxInfer.jl

quic/ai-hub-apps

interestingLSY/swiftLLM

EfficientMoE/MoE-Infinity

SearchSavior/OpenArc

gottingen/kumo-search

ROCm/MIVisionX

BMW-InnovationLab/BMW-TensorFlow-Inference-API-CPU

nilp0inter/experta

midea-ai/Aidget

tuanlda78202/gpt-oss-amd

matteocarnelos/microflow-rs

CAS-CLab/CNN-Inference-Engine-Quick-View