Triton server orchestration for production deployment
Opened this issue · 0 comments
kpedro88 commented
The Triton server(s) could be organized in several different ways for a realistic production deployment.
A. One server per model
- Requires some central map of IP:model name
- Does this imply one model per GPU?
B. Single server for all models (and all GPUs)
- Load-balancing already works well
- Need to ensure serving multiple models can be done efficiently
C. Some hybrid of A and B
D. Other?
In addition, it's likely that at least each Tier1/Tier2 would eventually have their own GPU servers (to reduce latency). The IP addresses of each site's server(s) could be tracked in e.g. site-local-config.xml
or another appropriate part of the production infrastructure.
Triton 2.X supports https/ssl, which could potentially be used for client-server authentication in production to maintain security.
attn: @violatingcp @holzman @mapsacosta