Stricter model versioning

Question

Stricter model versioning

Closed this issue 3 years ago · 5 comments

Is your feature request related to a problem? Please describe.
The existing model versioning (using named/numbered folders) is useful to get started quickly and try out prototypes of server usage. However, in a complex production environment, where there might be many models provided by multiple contributors, and many servers hosting those models, it would be useful to have a stricter approach.

Describe the solution you'd like
One proposal: enforce model versioning by comparing checksums of all relevant files.

Ideally, this should include the .pbtxt config files, which currently aren't part of the versioned folders in the model repository. The config files could potentially change, depending on how the model was updated.

To maintain some compatibility and ease of use for users who don't need such an involved scheme, it might be necessary to support this (or an equivalent) as well as the current, simpler method.

Describe alternatives you've considered
Manually requiring all contributors to follow the same versioning scheme and all server admins to obtain models from the same repository... but this is prone to errors.

Additional context
This is particularly needed for scientific computing, where the model version must be absolutely known, with no room for error (or invalid results could be produced).

Answer 1 · 2020-09-24T17:29:29.000Z

Can you give some examples of how these checksums would be used?

Answer 2 · 2020-10-08T20:36:56.000Z

Rather than just specifying a model version, the entire checksum would be sent as part of the model request, and the server would have to check its model files to make sure they match the checksum. This ensures there are no discrepancies or inconsistencies between what the local CPU process requests and what is available on the remote server.

The V2 API treats model version as a string rather than an integer, which could make this easier.

Answer 3 · 2020-11-16T15:41:37.000Z

Coming back to this: the "missing" feature on the server side is checking whether the model files match the checksum. This could be optional, following the proposed implementation with the checksum used as the directory name.

On a broader, but related topic: the usage of config.pbtxt seems overloaded right now.

the config.pbtxt file isn't versioned along with the other model files, but in principle, the inputs and outputs could change from one model version to the next. Maybe that kind of change just has to be treated as an entirely separate model rather than a new version?
the config.pbtxt file also handles a lot of metadata related to server deployment (number of instances, which devices to use, dynamic batching, etc.). This kind of information is not necessarily related to the actual model itself. It might be clearer to specify it separately. Even better would be the ability to set these properties dynamically with gRPC messages to the already-running server.

Answer 4 · 2020-11-16T18:31:30.000Z

For 1, yes the model versions are intended to be different versions of the same model so they share a configuration file. If models vary in inputs or outputs they will need to be separate models in the repo. We have had other requests to allow API to change configuration. The danger here is that you now have transient configuration information that is not store persistently anywhere. For example, if the server restarts how do you bring it back to the same state? Do you have some external agent that replays all the configurartion changes?

…

________________________________ From: Kevin Pedro <notifications@github.com> Sent: Monday, November 16, 2020 7:41 AM To: triton-inference-server/server <server@noreply.github.com> Cc: David Goodwin <DAVIDG@nvidia.com>; Comment <comment@noreply.github.com> Subject: Re: [triton-inference-server/server] Stricter model versioning (#2019) Coming back to this: the "missing" feature on the server side is checking whether the model files match the checksum. This could be optional, following the proposed implementation with the checksum used as the directory name. On a broader, but related topic: the usage of config.pbtxt seems overloaded right now. 1. the config.pbtxt file isn't versioned along with the other model files, but in principle, the inputs and outputs could change from one model version to the next. Maybe that kind of change just has to be treated as an entirely separate model rather than a new version? 2. the config.pbtxt file also handles a lot of metadata related to server deployment (number of instances, which devices to use, dynamic batching, etc.). This kind of information is not necessarily related to the actual model itself. It might be clearer to specify it separately. Even better would be the ability to set these properties dynamically with gRPC messages to the already-running server. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#2019 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABG6GZCKAZL72TOTQBJ4L3TSQFB4BANCNFSM4RMFAKKQ>.

Answer 5 · 2020-11-16T18:47:20.000Z

Indeed, knowing the state of the server is important. Perhaps the "final" configuration could be written somewhere upon request (or upon shutdown)? It does merit some more thought. But keeping the metadata in a separate file (rather than the same config.pbtxt) would already be useful.