The NVIDIA Triton Inference Server provides a robust and configurable solution for deploying and managing AI models. The Triton Model Navigator is a tool that provides the ability to automate the process of moving model from source to optimal format and configuration for deployment on Triton Inference Server. The tool support export model from source to all possible formats and apply the Triton Inference Server backend optimizations. Finally, it uses the Triton Model Analyzer to find the best Triton Model configuration, matches the provided constraints, and optimize performance.
Python Export API that helps with exporting model from framework to all possible formats.
This stage is dedicated to assure the model is inference ready on time of training, by executing conversion, correctness and performance
tests that help to identify model related issues. Artifacts produced by Triton Model Navigator are stored in a .nav
package that contains checkpoints and all necessary information for further processing by Model Navigator CLI.
The optimizer part use the generated .nav
package and run all possible conversion to available formats and apply
addition Triton Inference Server backends optimizations. Finally, it uses internally
the Triton Model Analyzer
to find the best Triton Model configuration, matches the provided constraints, and optimize performance.