aws/sagemaker-huggingface-inference-toolkit

How can I delpoy a model with AWS S3 and without downloading model from hunggingface via TGI image on Sagemaker?

Closed this issue · 2 comments

Concise Description:

How can I delpoy a model with AWS S3 and without downloading model from hunggingface via TGI image on Sagemaker?

DLC image/dockerfile:

763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi0.9.3-gpu-py39-cu118-ubuntu20.04

Current behavior:

HF_MODEL_ID is a must, and I have set the S3 path for model_data, but it always downloads model files from remote hunggingface when I want to deploy the Sagemaker endpoint in AWS.

import json
from sagemaker.huggingface import HuggingFaceModel

# sagemaker config
instance_type = "ml.g5.12xlarge"
number_of_gpu = 4
health_check_timeout = 300

# Define Model and Endpoint configuration parameter
config = {
'HF_MODEL_ID':'OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5',
'SM_NUM_GPUS': json.dumps(number_of_gpu), # Number of GPU used per replica
'MAX_INPUT_LENGTH': json.dumps(2000),
'MAX_TOTAL_TOKENS': json.dumps(2048),
}

# create HuggingFaceModel with the image uri
llm_model = HuggingFaceModel(
model_data="s3://S3_PATH/oasst-sft-4-pythia-12b-epoch-3.5.tar.gz",
role=role,
image_uri=llm_image,
env=config
)

llm = llm_model.deploy(
endpoint_name="oasst-sft-4-pythia-12b-epoch-35-12x",
initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=health_check_timeout, # 10 minutes to be able to load the model
)

Expected behavior:

I can use the model file on AWS S3 without remote hunggingface.

You can take a look at this example: https://www.philschmid.de/sagemaker-llm-vpc

You can take a look at this example: https://www.philschmid.de/sagemaker-llm-vpc

OK, thanks