aws/amazon-sagemaker-examples

[Bug Report] Status: Failed when host Llama2-70B on Amazon SageMaker using TRT-LLM LMI container

NarenZen opened this issue · 0 comments

Link to the notebook
Notebook

Describe the bug

A clear and concise description of what the bug is.

**To reproduce**
import time

resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
status = resp["EndpointStatus"]
print("Status: " + status)

while status == "Creating":
    time.sleep(60)
    resp = sm_client.describe_endpoint(EndpointName=endpoint_name)
    status = resp["EndpointStatus"]
    print("Status: " + status)

print("Arn: " + resp["EndpointArn"])
print("Status: " + status)

Running the above code, gives the below error

Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Creating
Status: Failed
Arn: arn:aws:sagemaker:us-west-2:884893011661:endpoint/llama2-djl-2023-11-30-05-48-40-806-endpoint1
Status: Failed

Logs
If applicable, add logs to help explain your problem.
You may also attach an .ipynb file to this issue if it includes relevant logs or output.