inference

There are 1340 repositories under inference topic.

DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Language:Python1.9k
XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Language:C1.9k
tensorflow_template_application
TensorFlow template application for deep learning
Language:Python1.9k
zml
Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild
Language:Zig1.7k
ao
PyTorch native quantization and sparsity for training and inference
Language:Python1.7k
NNPACK
Acceleration package for neural networks on multi-core CPUs
Language:C1.7k
transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
Language:Python1.7k
delta
DELTA is a deep learning based natural language and speech processing platform. LF AI & DATA Projects: https://lfaidata.foundation/projects/delta/
Language:Python1.6k
TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Language:C++1.5k
huggingface.js
Utilities to use the Hugging Face Hub API
Language:TypeScript1.4k
inference
A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
Language:Python1.4k
budgetml
Deploy a ML inference service on a budget in less than 10 lines of code.
Language:Python1.3k
agibot_x1_infer
The inference module for AgiBot X1.
Language:C++1.3k
BERT-NER
Pytorch-Named-Entity-Recognition-with-BERT
Language:Python1.2k
awesome-ml-demos-with-ios
The challenge projects for Inferencing machine learning models on iOS
Language:Python1.2k
CausalDiscoveryToolbox
Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.
Language:Python1.1k
multi-model-server
Multi Model Server is a tool for serving neural net models for inference
Language:Java1k
ort
Fast ML inference & training for ONNX models in Rust
Language:Rust989
neuropod
A uniform interface to run deep learning models from multiple frameworks
Language:C++937
bolt
Bolt is a deep learning library with high performance and heterogeneous flexibility.
Language:C++927
ims
📚 Introduction to Modern Statistics - A college-level open-source textbook with a modern approach highlighting multivariable relationships and simulation-based inference. For v1, see https://openintro-ims.netlify.app.
Language:JavaScript874
Adlik
Adlik: Toolkit for Accelerating Deep Learning Inference
Language:C++794
cppflow
Run TensorFlow models in C++ without installation and without Bazel
Language:C++787
AI-Engineering.academy
Mastering Applied AI, One Concept at a Time
Language:Jupyter Notebook771
pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
Language:Python754
GenossGPT
One API for all LLMs either Private or Public (Anthropic, Llama V2, GPT 3.5/4, Vertex, GPT4ALL, HuggingFace ...) 🌈🐂 Replace OpenAI GPT with any LLMs in your app with one line.
Language:Python751
bark.cpp
Suno AI's Bark model in C/C++ for fast text-to-speech generation
Language:C++750
awesome-emdl
Embedded and mobile deep learning research resources
742
pipeless
An open-source computer vision framework to build and deploy apps in minutes
Language:Rust732
yolort
yolort is a runtime stack for yolov5 on specialized accelerators such as tensorrt, libtorch, onnxruntime, tvm and ncnn.
Language:Python724
onepanel
The open source, end-to-end computer vision platform. Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises.
Language:Go722
InferLLM
a lightweight LLM model inference framework
Language:C++708
ai-reference-models
Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Intel® Data Center GPUs
Language:Python687
model_server
A scalable inference server for models optimized with OpenVINO™
Language:C++687
tf_trt_models
TensorFlow models accelerated with NVIDIA TensorRT
Language:Python686
filetype.py
Small, dependency-free, fast Python package to infer binary file types checking the magic numbers signature
Language:Python675