How to enable Batch inference on AWS deployed Serverless model from Hub?

Question

How to enable Batch inference on AWS deployed Serverless model from Hub?

jmparejaz opened this issue a year ago · 1 comments

I am using the serverless inference from Sagemaker with Huggingface Model from the hub
according this example :
https://github.com/huggingface/notebooks/blob/main/sagemaker/19_serverless_inference/sagemaker-notebook.ipynb

using the
#image uri
image_container=get_huggingface_llm_image_uri("huggingface",version="0.9.3")

I was expecting the resulting pipeline to execute as the Pipeline class from transformers for this task (text generation)
however, the input does not work with list.

Is there any approach to do batch inference on Sagemaker SDK?

Answer 1 · 2023-09-07T06:16:14.000Z

Hello @jmparejaz,

The input schema for the LLM container should be the same with {"inputs":"text", "parameters": {}} what issue are you seeing. The only difference here is that the LLM container has additional/different parameter, see here: https://huggingface.co/blog/sagemaker-huggingface-llm#4-run-inference-and-chat-with-our-model