[Llama2 inferentia] : runtime error when invoking endpoint through boto3
krokoko opened this issue · 0 comments
krokoko commented
Describe the bug
Using a Lambda function with boto3 to query the neuron llama2 7b f model deployed on a ML INF2 XLARGE instance, the invoke endpoint operation fails with the following message:
{
"errorMessage": "An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message \"{\n \"code\": 400,\n \"type\": \"BadRequestException\",\n \"message\": \"Parameter model_name is required.\"\n}\n\". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/testllamaneuron in account XXXXXXX for more information.",
"errorType": "ModelError",
"requestId": "2f2a7aa4-9eeb-42f5-9a14-6285894581bb",
"stackTrace": [
" File \"/var/task/lambda.py\", line 19, in handler\n response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,\n",
" File \"/var/runtime/botocore/client.py\", line 530, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File \"/var/runtime/botocore/client.py\", line 960, in _make_api_call\n raise error_class(parsed_response, operation_name)\n"
]
}
The model configuration is as follow:
- image: 763104351884.dkr.ecr.us-east-2.amazonaws.com/djl-inference:0.24.0-neuronx-sdk2.14.1
- env variables:
- modelId: meta-textgenerationneuron-llama-2-7b-f
- modelVersion: 1.0.0
To reproduce
- Deploy the model to an endpoint
- Create a lambda function to query the endpoint with the following code:
import boto3
import json
def handler(event, context):
runtime= boto3.client('runtime.sagemaker')
ENDPOINT_NAME = 'testllamaneuron'
dic = {
"inputs": [
[
{"role": "system", "content": "You are chat bot who writes songs"},
{"role": "user", "content": "Write a rap song about Amazon Web Services"}
]
],
"parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
}
response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='application/json',
Body=json.dumps(dic),
CustomAttributes="accept_eula=true")
result = json.loads(response['Body'].read().decode())
print(result)
return {
"statusCode": 200,
"body": json.dumps(result)
}
Logs
Lambda Function logs:
[ERROR] ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "BadRequestException",
"message": "Parameter model_name is required."
}