triton-inference-server/server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

PythonBSD-3-Clause

Issues

Versioning for ensemble models and/or confg.pbtxt files
#8056 opened 5 days ago by ghicks-novaprime
2
Building triton server python_backend from source
#8060 opened a day ago by mritunjaysharma394
0
[feature request] Real-time streaming inference load generation by `perf_analyzer`
#8059 opened 2 days ago by vadimkantorov
0
OpenAI Frontend Batch Support
#8058 opened 3 days ago by Loc8888
0
RFE: Function calling in OpenAI Frontend
#8048 opened 7 days ago by thehumit
1
Significant performance degradation when using OpenAI Frontend + streaming
#8045 opened 3 days ago by jolyons123
3
Performance Discrepancy Between NVIDIA Triton and Direct Faster-Whisper Inference
#8016 opened 21 days ago by YuBeomGon
0
why triton server used so many thread in same triton proc?
#8017 opened 21 days ago by soulseen
0
CUDA Race Condition in TensorRT GEMM Kernel with Triton Inference Server load tensorRT model
#8057 opened 4 days ago by neezeeyee
0
Multibyte UTF-8 Characters Broken in Streaming Mode (� Substitution)
#8039 opened 12 days ago by Nurgl
2
InferenceServerException: [408] an exception occurred in the client while decoding the response: Parse error at offset 0: Invalid value.
#8051 opened 6 days ago by TopAgrume
0
Segment fault crash due to race condition of request cancellation (with fix proposal)
#8034 opened 14 days ago by lunwang-ttd
5
Bazel support and tag release for individual repos
#8049 opened 6 days ago by arpit15
0
[Question] How can I make a limit on the length of the input context and the number of tokens to generate?
#8029 opened 7 days ago by ArtemBiliksin
4
`build.py` setting docker build args for secrets even when build-secret flag is not present
#7992 opened 7 days ago by BenjaminBraunDev
3
How to Send FP16 Input Tensors Using gRPC in C# for NVIDIA Triton Inference Server?
#8044 opened 11 days ago by Madihaa-Shaikh
0
Python Backend GPU Tensor Support on Windows - A Must-Have!
#8041 opened 11 days ago by mhbassel
0
[Question] Triton Inference server vLLM backend vs vLLM serve
#8036 opened 13 days ago by pradghos
0
Method 'forward' is not defined error !
#7968 opened a month ago by MHmi1
2
`k8s-onprem` Chart doesn't work with OpenShift's default security posture
#8004 opened 14 days ago by jharmison-redhat
2
Python Backend on Windows
#8012 opened 14 days ago by mhbassel
1
Triton llm openai langgraph toolcall
#8033 opened 14 days ago by GGN1994
0
Python backend without GIL
#8032 opened 14 days ago by zeruniverse
0
Request Cancellation
#8030 opened 14 days ago by MichalPogodski
0
First value replicated over entire input array
#8025 opened 15 days ago by FCollaPi
1
Streaming support on Infer endpoint when DECOUPLED mode is true
#8021 opened 20 days ago by adityarap
4
Got run time error `0 active drivers ([]). There should only be one.` when using PipelineModule through ray and deepspeed
#8007 opened a month ago by consciousgaze
2
Infinite pending status from 3 days after launching server
#8028 opened 15 days ago by nbowon
0
leak memory
#8026 opened 18 days ago by aTunass
1
How can I construct a pb_utils.Tensor without using numpy?
#8022 opened 20 days ago by fighterhit
0
"output tensor shape does not match size of output" when using python backend and providing a custom environment
#8019 opened 20 days ago by Isuxiz
1
Inconsistent HF token requirements for cached gated models: Triton vs vLLM deployments
#8020 opened 20 days ago by haka-qylis
0
Performance issue - High queue times in perf_analyzer
#7986 opened a month ago by asaff1
4
Something like "model instance index" inside python backend
#7984 opened a month ago by vadimkantorov
1
Python Backend support implicit state management for Sequence Inference
#8006 opened a month ago by zhuichao001
0
Unable to load model from S3 bucket
#8008 opened a month ago by jmlaubach
0
Got CMake Error: "CMAKE_CUDA_ARCHITECTURES must be non-empty if set" during build without docker
#8003 opened a month ago by simonzgx
1
`libtriton_fil.so` missing on Arm64 containers 24.12 and 25.01
#7991 opened a month ago by dagardner-nv
1
ONNX Model IR Version 10 Support
#8001 opened a month ago by RohanAdwankar
0
Can't build r25.01 (r24.12 builds okay) on `ubuntu-22.04` (unclear build errors). Also can't build `r24.12` on `ubuntu-24.04` (C++ errors)
#7997 opened a month ago by vadimkantorov
2
The system looks the same, but errors occur on some machines, but the reason is unknown
#7996 opened a month ago by coder-2014
0
Batching
#7994 opened a month ago by riyajatar37003
0
[BUG] [GenAI-Perf] openai-fronted server with --endpoint-type completions
#7995 opened a month ago by jihyeonRyu
0
ERROR: failed to solve: nvcr.io/nvidia/tritonserver:24.08-py3: failed to authorize: failed to fetch oauth token: unexpected status from GET request to https://nvcr.io/proxy_auth?scope=repository%3Anvidia%2Ftritonserver%3Apull: 401
#7988 opened a month ago by monajalal
1
nvcc fatal : Unsupported gpu architecture 'compute_100'
#7989 opened a month ago by Liufeiran123
1
Unable to get response from bls async call
#7982 opened a month ago by riyajatar37003
0
Expected model dimensions when expected shape is not suitable to batch
#7981 opened a month ago by codeofdutyAI
0
[Question] triton-client numpy 2 support
#7979 opened a month ago by john-pixforce
0
Pytorch backend: Model is run in no_grad mode even with INFERENCE_MODE=false
#7974 opened a month ago by hakanardo
0
TypeError: object of type 'int' has no len()
#7967 opened a month ago by ProgramerSalar
0