inference-engine

There are 282 repositories under inference-engine topic.

  • FedML-AI/FedML

    FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

    Language:Python4k91330761
  • zjhellofss/KuiperInfer

    校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

    Language:C++3.2k2728349
  • hyperjumptech/grule-rule-engine

    Rule engine implementation in Golang

    Language:Go2.4k60188371
  • onediff

    siliconflow/onediff

    OneDiff: An out-of-the-box acceleration library for diffusion models.

    Language:Jupyter Notebook1.9k41478125
  • aphrodite-engine/aphrodite-engine

    Large-scale LLM inference engine

    Language:C++1.6k21241176
  • Tencent/FeatherCNN

    FeatherCNN is a high performance inference engine for convolutional neural networks.

    Language:C++1.2k9844281
  • PaddlePaddle/Paddle.js

    Paddle.js is a web project for Baidu PaddlePaddle, which is an open source deep learning framework running in the browser. Paddle.js can either load a pre-trained model, or transforming a model from paddle-hub with model transforming tools provided by Paddle.js. It could run in every browser with WebGL/WebGPU/WebAssembly supported. It could also run in Baidu Smartprogram and WX miniprogram.

    Language:JavaScript1.1k91147146
  • zhihu/ZhiLight

    A highly optimized LLM inference acceleration engine for Llama and its variants.

    Language:C++9035217103
  • quic/ai-hub-models

    The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

    Language:Python83022234146
  • Adlik/Adlik

    Adlik: Toolkit for Accelerating Deep Learning Inference

    Language:C++8062718182
  • msnh2012/Msnhnet

    🔥 (yolov3 yolov4 yolov5 unet ...)A mini pytorch inference framework which inspired from darknet.

    Language:C++7452423147
  • insight-platform/Savant

    Python Computer Vision & Video Analytics Framework With Batteries Included

    Language:Python7131254566
  • xllm

    jd-opensource/xllm

    A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

    Language:C++67642677
  • ovg-project/kvcached

    Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

    Language:Python621126357
  • pylint-dev/astroid

    A common base representation of python source code for pylint and other projects

    Language:Python56123864307
  • Forward

    Tencent/Forward

    A library for high performance deep learning inference on NVIDIA GPUs.

    Language:C++557212865
  • PaddlePaddle/Anakin

    High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

    Language:C++5345587134
  • andrewkchan/yalm

    Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

    Language:C++5288150
  • HoloClean/holoclean

    A Machine Learning System for Data Enrichment.

    Language:Python5282933131
  • buguroo/pyknow

    PyKnow: Expert Systems for Python

    Language:Python4903848151
  • ulfurinn/wongi-engine

    A rule engine written in Ruby.

    Language:Ruby488236440
  • zjhellofss/KuiperLLama

    校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

    Language:C++44552117
  • chengzeyi/ParaAttention

    https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

    Language:Python38693138
  • ReactiveBayes/RxInfer.jl

    Julia package for automated Bayesian inference on a factor graph with reactive message passing

    Language:Jupyter Notebook3711617433
  • quic/ai-hub-apps

    The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

    Language:Java329119781
  • interestingLSY/swiftLLM

    A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

    Language:Python2863635
  • EfficientMoE/MoE-Infinity

    PyTorch library for cost-effective, fast and easy serving of MoE models.

    Language:Python25763819
  • SearchSavior/OpenArc

    Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.

    Language:Python23872010
  • gottingen/kumo-search

    docs for search system and ai infra

  • MIVisionX

    ROCm/MIVisionX

    MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.

    Language:C++2032425483
  • BMW-InnovationLab/BMW-TensorFlow-Inference-API-CPU

    This is a repository for an object detection inference API using the Tensorflow framework.

    Language:Python18313247
  • nilp0inter/experta

    Expert Systems for Python

    Language:Python17893445
  • midea-ai/Aidget

    Ai edge toolbox,专门面向边端设备尤其是嵌入式RTOS平台,AI模型部署工具链,包括模型推理引擎和模型压缩工具

    Language:Python16412630
  • tuanlda78202/gpt-oss-amd

    implement GPT-OSS 20B & 120B C++ inference from scratch on AMD GPUs

    Language:C++15612
  • matteocarnelos/microflow-rs

    A robust and efficient TinyML inference engine.

    Language:Rust1542414
  • CAS-CLab/CNN-Inference-Engine-Quick-View

    A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices.