Support for serving multiple models with a single instance

Question

Support for serving multiple models with a single instance

alexkillen opened this issue 5 years ago · 3 comments

This would provide the ability to serve multiple models, and multiple versions of each model, with a single serving instance.

Details can be seen here: https://www.tensorflow.org/tfx/serving/serving_config#model_server_configuration

In practice, this could be achieved by providing a --model-config option that could be used instead of the --model argument, like below:

nvidia-docker run nmtwizard/opennmt-tf \
    --storage_config storages.json \
    --model_storage s3_model: \
    --model-config /path/to/models.config \
    --gpuid 1 \
    serve --host 0.0.0.0 --port 5000

Then when calling 'tensorflow_model_server' in the opennmt-tf docker image entrypoint.py, the argument --model-config-file could be used instead of --model-name and --model-base-path.

Answer 1 · 2019-09-23T07:08:12.000Z

Thanks for the request!

I agree it would be nice to serve multiple models with a single instance.

However, this project is not designed for TensorFlow Serving specifically, which is merely an implementation detail to serve OpenNMT-tf models. So we should come up with a more general design and API to support specifying multiple models. Let us think about that.

Answer 2 · 2019-09-24T10:57:07.000Z

Thanks for the prompt reply, having to figure out how to handle this in a more general way makes sense.

Just on the topic of Tensorflow Serving, I've played around with it a little and seems quite straight-forward using the --model_config_file and --model_config_file_poll_wait_seconds arguments, however one note is that the latter is currently only available in the "nightly" tags of the tensorflow/serving docker images at present. With those arguments it is simply a case of running the container and then editing the config file, which tensorflow_model_server will poll, when you want to add/edit a model.

There is a noticeable overhead of using the same instance to serve multiple models, as you would expect, but it's not huge, and I'd imagine tweaking the batch parameters would improve it further.

Looking forward to seeing how you handle this, your REST API is a lot nicer to use than that exposed by Tensorflow Serving.

Answer 3 · 2020-02-03T11:43:58.000Z

I just updated the title so it's not specific to Tensorflow Serving, which it looks like is no longer used for serving OpenNMT-tf models anyway.