aws/sagemaker-huggingface-inference-toolkit

Serverless inference using the Sagemaker toolkit

arnaudstiegler opened this issue · 5 comments

Hey!
I was looking at how you've built this inference toolkit to try and figure out how to couple using the Multi-Model-Service package with serverless inference. I've seen that you coded your own start_model_server and your own service handler, I'd be super interested to hear whether any of the changes are related to using serverless inference endpoints. Thank you!

SageMaker Platform features like Serverless Inference are not directly built into the toolkit. For Multimodel it should be the same except that we make sure the if a inference.py is provided it is used.

Thanks for the answer, let me maybe clarify my question: I have a custom container that uses the sagemaker inference toolkit, and it works well for provisioned deployment. But it fails when I try to deploy it for serverless inference because of some errors in the default Sagemaker MMS web server (this for instance).
So I was curious to hear whether you made any specific change to the huggingface toolkit to have it work OOTB with serverless

According to the documentation it is currently not possible to use custom registries/container for Serverless Inference: https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html

They do mention that private registries are not supported, but they also specifically say that custom containers are supported (in the Container Support section). And the serverless endpoint starts, so I think it's only really a code issue with the sagemaker-inference default webserver

Oh I actually found an answer there.
Thanks for your answers!