inference

There are 1648 repositories under inference topic.

  • vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Language:Python58.2k43710.8k10.2k
  • whisper.cpp

    ggml-org/whisper.cpp

    Port of OpenAI's Whisper model in C/C++

    Language:C++43.3k3391.8k4.7k
  • ColossalAI

    hpcaitech/ColossalAI

    Making large AI models cheaper, faster and more accessible

    Language:Python41.2k3901.7k4.5k
  • deepspeedai/DeepSpeed

    DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

    Language:Python40.1k3533.2k4.6k
  • google-ai-edge/mediapipe

    Cross-platform, customizable ML solutions for live and streaming media.

    Language:C++31.4k5215.5k5.5k
  • Tencent/ncnn

    ncnn is a high-performance neural network inference framework optimized for the mobile platform

    Language:C++22.1k5703.7k4.3k
  • SYSTRAN/faster-whisper

    Faster Whisper transcription with CTranslate2

    Language:Python18.1k1448711.5k
  • sgl-project/sglang

    SGLang is a fast serving framework for large language models and vision language models.

    Language:Python18k1173.2k2.9k
  • stas00/ml-engineering

    Machine Learning Engineering Open Book

    Language:Python15.1k12938910
  • ts-pattern

    gvergnaud/ts-pattern

    🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.

    Language:TypeScript14.2k31183156
  • NVIDIA/TensorRT

    NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

    Language:C++12.1k1594.1k2.2k
  • aws/amazon-sagemaker-examples

    Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

    Language:Jupyter Notebook10.7k2711.4k6.9k
  • huggingface/text-generation-inference

    Large Language Model Text Generation Inference

    Language:Python10.5k1041.5k1.2k
  • triton-inference-server/server

    The Triton Inference Server provides an optimized cloud and edge inferencing solution.

    Language:Python9.8k1504k1.6k
  • openvinotoolkit/openvino

    OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

    Language:C++8.8k1923.1k2.7k
  • xorbitsai/inference

    Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

    Language:Python8.5k582.5k741
  • jetson-inference

    dusty-nv/jetson-inference

    Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.

    Language:C++8.5k2771.9k3.1k
  • oumi-ai/oumi

    Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

    Language:Python8.5k6279643
  • Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB

    💎1MB lightweight face detection model (1MB轻量级人脸检测模型)

    Language:Python7.4k1922681.5k
  • gcanti/io-ts

    Runtime type system for IO decoding/encoding

    Language:TypeScript6.8k50442326
  • GeeeekExplorer/nano-vllm

    Nano vLLM

    Language:Python6.6k822
  • Trusted-AI/adversarial-robustness-toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

    Language:Python5.5k989051.3k
  • LMCache/LMCache

    Supercharge Your LLM with the Fastest KV Cache Layer

    Language:Python5.3k9210583
  • superduper

    superduper-io/superduper

    Superduper: End-to-end framework for building custom AI applications and agents.

    Language:Python5.2k431.4k527
  • argmaxinc/WhisperKit

    On-device Speech Recognition for Apple Silicon

    Language:Swift5k48195452
  • AutoGPTQ/AutoGPTQ

    An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

    Language:Python4.9k30485526
  • NVIDIA-AI-IOT/torch2trt

    An easy to use PyTorch to TensorRT converter

    Language:Python4.8k72733693
  • Tencent/TNN

    TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.

    Language:C++4.6k88956768
  • cube-studio

    tencentmusic/cube-studio

    cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,mlops算法链路全流程,算力租赁平台,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU虚拟化,边缘计算,标注平台自动化标注,deepseek等大模型sft微调/奖励模型/强化学习训练,vllm/ollama/mindie大模型多机推理,私有知识库,AI模型市场,支持国产cpu/gpu/npu 昇腾生态,支持RDMA,支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/ray/volcano等分布式

    Language:Python4.6k77157804
  • openvinotoolkit/open_model_zoo

    Pre-trained Deep Learning models and demos (high quality and extremely fast)

    Language:Python4.3k1444631.4k
  • typedb

    typedb/typedb

    TypeDB: the power of programming, in your database

    Language:Rust4k1162.5k348
  • OpenNMT/CTranslate2

    Fast inference engine for Transformer models

    Language:C++4k56767394
  • OpenCSGs/csghub

    CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both open-source and on-premise/SaaS solutions, with features comparable to Hugging Face. Gain full control over the lifecycle of LLMs, datasets, and agents, with Python SDK compatibility with Hugging Face. Join us! ⭐️

    Language:Vue4k374196402
  • Mooncake

    kvcache-ai/Mooncake

    Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

    Language:C++3.9k37255375
  • gpustack/gpustack

    Simple, scalable AI model deployment on GPU clusters

    Language:Python3.7k361.7k376
  • PaddlePaddle/FastDeploy

    High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

    Language:Python3.5k541.3k622