aws-neuron/aws-neuron-sdk

Multiple models on torchserve

brunonishimoto opened this issue · 5 comments

Is it possible to deploy multiple models (from multiple mar files) using torchserve inside the inf2 machine?

I routed to the right team and working on getting you the answer.

I routed to the right team and working on getting you the answer.

Sure, thanks very much @chafik-c

We should be able to deploy multiple single core models. However, note that only one neuron core can be allocated to single worker process for one model, and cannot be shared between multiple processes. The maximum number of models you can run simultaneously is limited by the number of cores. If you are using multiple worker processes per model, note that each worker process will consume a neuron core.

This document talks about how we can use torch serve.
https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/tutorials/inference/tutorial-torchserve-neuronx.html

If you want fine grained control over which neuron core to load the model this documentation should help.
https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/programming-guide/inference/core-placement.html?highlight=placement

@jeffhataws I see, thanks for your response!

@brunonishimoto thanks for reaching out. Let us know what else we can help you with.