InternalServerException while deploying HuggingFace model on SageMaker
KartikKannapur opened this issue · 6 comments
Background
We are working to deploy the AllenAI Cosmo XL model - https://huggingface.co/allenai/cosmo-xl for inference on SageMaker.
Our Approach
We are using a SageMaker notebook.
Instance: ml.p3.2xlarge
Kernel: conda_pytorch_p39
Code is directly from the HuggingFace website here: https://huggingface.co/allenai/cosmo-xl
We select Deploy
-> Amazon SageMaker
Select Task=Conversational
and Configuration=AWS
and copied the code into our notebook.
Error
The model is created and deployed.
But, when we run predictor.predict
for model inference, we run into the following error
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Could not load model /.sagemaker/mms/models/allenai__cosmo-xl with any of the following classes: (\u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM\u0027\u003e, \u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForCausalLM\u0027\u003e, \u003cclass \u0027transformers.models.t5.modeling_t5.T5ForConditionalGeneration\u0027\u003e)."
}
Relevant error messages from CloudWatch
2023-03-15T19:53:32,781 [INFO ] W-allenai__cosmo-xl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Prediction error
2023-03-15T19:53:32,783 [INFO ] W-allenai__cosmo-xl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 219, in handle
2023-03-15T19:53:32,784 [INFO ] W-allenai__cosmo-xl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.8/site-packages/transformers/pipelines/__init__.py", line 549, in pipeline
2023-03-15T19:53:32,785 [INFO ] W-allenai__cosmo-xl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise ValueError(f"Could not load model {model} with any of the following classes: {class_tuple}."
2023-03-15T19:53:32,785 [INFO ] W-allenai__cosmo-xl-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - ValueError: Could not load model /.sagemaker/mms/models/allenai__cosmo-xl with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSeq2SeqLM'>, <class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.t5.modeling_t5.T5ForConditionalGeneration'>)
Hello @KartikKannapur,
Thank you for opening the issue. The snippet ist current not up to date regarding the versions. Could you try with transformers version 4.26
?
Hey @philschmid
Thank you for looking into this.
I updated the transformers version to 4.26
The SageMaker SDK version is 2.132.0
and I received the following error message
ValueError: Unsupported huggingface version: 4.26.0. You may need to upgrade your SDK version (pip install -U sagemaker) for newer huggingface versions. Supported huggingface version(s): 4.6.1, 4.10.2, 4.11.0, 4.12.3, 4.17.0, 4.6, 4.10, 4.11, 4.12, 4.17.
So, I updated the SageMaker SDK to version 2.140.1
and when I ran the predictor.predict
for model inference, I ran into the same error as before.
Any suggestions?
Could you share your code how you deployed? i will try to reproduce it then.
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'allenai/cosmo-xl',
'HF_TASK':'conversational'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.26.0',
pytorch_version='1.13.1',
py_version='py39',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge' # ec2 instance type
)
predictor.predict({
'inputs': {
"past_user_inputs": ["Which movie is the best ?"],
"generated_responses": ["It's Die Hard for sure."],
"text": "Can you explain why ?"
}
})
Thanks @philschmid
Appreciate your support on this.
Ah i see. the task conversation
is not correct here. Not sure why they added it manually. Can you switch the task to text2text-generation
? then it should work.
Also from reading the model card: https://huggingface.co/allenai/cosmo-xl#how-to-use it might make sense if you create a custom infernece.py
to get the "conversational" flow in to the generation method.
Understood. Thanks @philschmid
I'm going to go the custom inference.py
route.