triton-inference-server

Pinned Repositories

backend
Common source, scripts and utilities for creating Triton backends.
Language:C++347 14 0100
client
Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
Language:Python645 13 57244
core
The core library and APIs implementing the Triton Inference Server.
Language:C++150 26 0114
model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Language:Python490 12 17178
model_navigator
Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.
Language:Python211 8 3227
python_backend
Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.
Language:C++641 15 0179
pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
Language:Python817 17 9955
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Language:Python9.8k 150 4k1.6k
tensorrtllm_backend
The Triton TensorRT-LLM Backend
Language:Shell889 21 557130
tutorials
This repository contains tutorials and examples for Triton Inference Server
Language:Python768 15 0127

triton-inference-server's Repositories

triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Language:Python9.8k 150 4k1.6k
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
Language:Shell889 21 557130
triton-inference-server/pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
Language:Python817 17 9955
triton-inference-server/tutorials
This repository contains tutorials and examples for Triton Inference Server
Language:Python768 15 0127
triton-inference-server/client
Triton Python, C++ and Java client libraries, and GRPC-generated client examples for go, java and scala.
Language:Python645 13 57244
triton-inference-server/python_backend
Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.
Language:C++641 15 0179
triton-inference-server/model_analyzer
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Language:Python490 12 17178
triton-inference-server/backend
Common source, scripts and utilities for creating Triton backends.
Language:C++347 14 0100
triton-inference-server/vllm_backend
Language:Python296 12 032
triton-inference-server/onnxruntime_backend
The Triton backend for the ONNX Runtime.
Language:C++161 13 11170
triton-inference-server/core
The core library and APIs implementing the Triton Inference Server.
Language:C++150 26 0114
triton-inference-server/pytorch_backend
The Triton backend for the PyTorch TorchScript models.
Language:C++144 10 049
triton-inference-server/dali_backend
The Triton backend that allows running GPU-accelerated data pre-processing pipelines implemented in DALI's python API.
Language:C++138 8 7533
triton-inference-server/tensorrt_backend
The Triton backend for TensorRT.
Language:C++78 10 034
triton-inference-server/fil_backend
FIL backend for the Triton Inference Server
Language:Jupyter Notebook77 19 17936
triton-inference-server/common
Common source, scripts and utilities shared across all Triton repositories.
Language:C++76 12 074
triton-inference-server/triton_cli
Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inference Server.
Language:Python61 10 125
triton-inference-server/perf_analyzer
Language:C++57 8 2714
triton-inference-server/tensorflow_backend
The Triton backend for TensorFlow.
Language:C++51 9 022
triton-inference-server/triton_distributed
Language:Rust48 16 5115
triton-inference-server/openvino_backend
OpenVINO backend for Triton.
Language:C++31 9 617
triton-inference-server/developer_tools
Language:C++18 12 010
triton-inference-server/redis_cache
TRITONCACHE implementation of a Redis cache
Language:C++13 4 34
triton-inference-server/checksum_repository_agent
The Triton repository agent that verifies model checksums.
Language:C++11 10 07
triton-inference-server/identity_backend
Example Triton backend that demonstrates most of the Triton Backend API.
Language:C++7 10 012
triton-inference-server/third_party
Third-party source packages that are modified for use in Triton.
Language:C7 8 058
triton-inference-server/local_cache
Implementation of a local in-memory cache for Triton Inference Server's TRITONCACHE API
Language:C++5 6 11
triton-inference-server/repeat_backend
An example Triton backend that demonstrates sending zero, one, or multiple responses for each request.
Language:C++5 8 07
triton-inference-server/square_backend
Simple Triton backend used for testing.
Language:C++2 8 05
triton-inference-server/.github
Community health files for NVIDIA Triton
1 2 01