inference

There are 1186 repositories under inference topic.

  • ColossalAI

    hpcaitech/ColossalAI

    Making large AI models cheaper, faster and more accessible

    Language:Python38.1k3791.6k4.3k
  • microsoft/DeepSpeed

    DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

    Language:Python33.2k3352.6k3.9k
  • whisper.cpp

    ggerganov/whisper.cpp

    Port of OpenAI's Whisper model in C/C++

    Language:C32.1k2921.2k3.2k
  • google-ai-edge/mediapipe

    Cross-platform, customizable ML solutions for live and streaming media.

    Language:C++25.8k4914.9k5k
  • vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Language:Python20.2k1942.9k2.7k
  • Tencent/ncnn

    ncnn is a high-performance neural network inference framework optimized for the mobile platform

    Language:C++19.4k5723.4k4.1k
  • ts-pattern

    gvergnaud/ts-pattern

    🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.

    Language:TypeScript11k32146114
  • aws/amazon-sagemaker-examples

    Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.

    Language:Jupyter Notebook9.6k2661.4k6.6k
  • SYSTRAN/faster-whisper

    Faster Whisper transcription with CTranslate2

    Language:Python9.4k112580790
  • NVIDIA/TensorRT

    NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

    Language:C++9.3k1483.5k2k
  • huggingface/text-generation-inference

    Large Language Model Text Generation Inference

    Language:Python8.1k981.1k893
  • triton-inference-server/server

    The Triton Inference Server provides an optimized cloud and edge inferencing solution.

    Language:Python7.5k1373.5k1.4k
  • jetson-inference

    dusty-nv/jetson-inference

    Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.

    Language:C++7.4k2711.8k2.9k
  • Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB

    💎1MB lightweight face detection model (1MB轻量级人脸检测模型)

    Language:Python7.1k1892651.5k
  • gcanti/io-ts

    Runtime type system for IO decoding/encoding

    Language:TypeScript6.6k54437330
  • openvinotoolkit/openvino

    OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

    Language:C++6.1k1862.5k2k
  • Trusted-AI/adversarial-robustness-toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

    Language:Python4.5k1018741.1k
  • superduperdb

    SuperDuperDB/superduperdb

    🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

    Language:Python4.4k391.1k433
  • NVIDIA-AI-IOT/torch2trt

    An easy to use PyTorch to TensorRT converter

    Language:Python4.4k74706665
  • Tencent/TNN

    TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.

    Language:C++4.3k92947755
  • openvinotoolkit/open_model_zoo

    Pre-trained Deep Learning models and demos (high quality and extremely fast)

    Language:Python4k1224541.4k
  • AutoGPTQ/AutoGPTQ

    An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

    Language:Python3.9k34422398
  • typedb

    vaticle/typedb

    TypeDB: the polymorphic database powered by types

    Language:Java3.7k1182.4k336
  • bytedance/lightseq

    LightSeq: A High Performance Library for Sequence Processing and Generation

    Language:C++3.1k59285325
  • xorbitsai/inference

    Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.

    Language:Python2.9k34754237
  • deepsparse

    neuralmagic/deepsparse

    Sparsity-aware deep learning inference runtime for CPUs

    Language:Python2.9k55128169
  • OpenNMT/CTranslate2

    Fast inference engine for Transformer models

    Language:C++2.9k56638258
  • cube-studio

    tencentmusic/cube-studio

    cube studio开源云原生一站式机器学习/深度学习AI平台,支持sso登录,多租户/多项目组,大数据平台对接,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU,边缘计算,serverless,标注平台,自动化标注,数据集管理,大模型微调,vllm大模型推理,llmops,私有知识库,AI模型应用商店,支持模型一键开发/推理/微调,支持国产cpu/gpu/npu芯片,支持RDMA,支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/spark/ray/volcano分布式

    Language:Jupyter Notebook2.7k68141493
  • pgmpy/pgmpy

    Python Library for learning (Structure and Parameter), inference (Probabilistic and Causal), and simulations in Bayesian Networks.

    Language:Python2.6k74872692
  • argmaxinc/WhisperKit

    Swift native on-device speech recognition with Whisper for Apple Silicon

    Language:Swift2.5k2481204
  • HuaizhengZhang/Awesome-System-for-Machine-Learning

    A curated list of research in machine learning systems (MLSys). Paper notes are also provided.

  • huggingface/optimum

    🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

    Language:Python2.2k59669383
  • zjhellofss/KuiperInfer

    带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

    Language:C++2k2124232
  • openvino_notebooks

    openvinotoolkit/openvino_notebooks

    📚 Jupyter notebook tutorials for OpenVINO™

    Language:Jupyter Notebook2k49278724
  • tairov/llama2.mojo

    Inference Llama 2 in one file of pure 🔥

    Language:Mojo2k2744137
  • tobegit3hub/tensorflow_template_application

    TensorFlow template application for deep learning

    Language:Python1.9k18740718