Serverless inference using the Sagemaker toolkit

Question

Serverless inference using the Sagemaker toolkit

arnaudstiegler opened this issue 2 years ago · 5 comments

Hey!
I was looking at how you've built this inference toolkit to try and figure out how to couple using the Multi-Model-Service package with serverless inference. I've seen that you coded your own start_model_server and your own service handler, I'd be super interested to hear whether any of the changes are related to using serverless inference endpoints. Thank you!

Answer 1 · 2022-06-10T06:54:57.000Z

SageMaker Platform features like Serverless Inference are not directly built into the toolkit. For Multimodel it should be the same except that we make sure the if a inference.py is provided it is used.

Answer 2 · 2022-06-10T13:52:58.000Z

Thanks for the answer, let me maybe clarify my question: I have a custom container that uses the sagemaker inference toolkit, and it works well for provisioned deployment. But it fails when I try to deploy it for serverless inference because of some errors in the default Sagemaker MMS web server (this for instance).
So I was curious to hear whether you made any specific change to the huggingface toolkit to have it work OOTB with serverless

Answer 3 · 2022-06-10T14:01:14.000Z

According to the documentation it is currently not possible to use custom registries/container for Serverless Inference: https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html

Answer 4 · 2022-06-10T14:15:11.000Z

They do mention that private registries are not supported, but they also specifically say that custom containers are supported (in the Container Support section). And the serverless endpoint starts, so I think it's only really a code issue with the sagemaker-inference default webserver

Answer 5 · 2022-06-10T14:41:35.000Z

Oh I actually found an answer there.
Thanks for your answers!