triton-inference-server/server

Stricter model versioning

Closed this issue · 5 comments

Is your feature request related to a problem? Please describe.
The existing model versioning (using named/numbered folders) is useful to get started quickly and try out prototypes of server usage. However, in a complex production environment, where there might be many models provided by multiple contributors, and many servers hosting those models, it would be useful to have a stricter approach.

Describe the solution you'd like
One proposal: enforce model versioning by comparing checksums of all relevant files.

Ideally, this should include the .pbtxt config files, which currently aren't part of the versioned folders in the model repository. The config files could potentially change, depending on how the model was updated.

To maintain some compatibility and ease of use for users who don't need such an involved scheme, it might be necessary to support this (or an equivalent) as well as the current, simpler method.

Describe alternatives you've considered
Manually requiring all contributors to follow the same versioning scheme and all server admins to obtain models from the same repository... but this is prone to errors.

Additional context
This is particularly needed for scientific computing, where the model version must be absolutely known, with no room for error (or invalid results could be produced).

Can you give some examples of how these checksums would be used?

Rather than just specifying a model version, the entire checksum would be sent as part of the model request, and the server would have to check its model files to make sure they match the checksum. This ensures there are no discrepancies or inconsistencies between what the local CPU process requests and what is available on the remote server.

The V2 API treats model version as a string rather than an integer, which could make this easier.

Coming back to this: the "missing" feature on the server side is checking whether the model files match the checksum. This could be optional, following the proposed implementation with the checksum used as the directory name.

On a broader, but related topic: the usage of config.pbtxt seems overloaded right now.

  1. the config.pbtxt file isn't versioned along with the other model files, but in principle, the inputs and outputs could change from one model version to the next. Maybe that kind of change just has to be treated as an entirely separate model rather than a new version?
  2. the config.pbtxt file also handles a lot of metadata related to server deployment (number of instances, which devices to use, dynamic batching, etc.). This kind of information is not necessarily related to the actual model itself. It might be clearer to specify it separately. Even better would be the ability to set these properties dynamically with gRPC messages to the already-running server.

Indeed, knowing the state of the server is important. Perhaps the "final" configuration could be written somewhere upon request (or upon shutdown)? It does merit some more thought. But keeping the metadata in a separate file (rather than the same config.pbtxt) would already be useful.