aws/sagemaker-huggingface-inference-toolkit

SageMaker deployment errors

jonrossclaytor opened this issue · 2 comments

Background

We are attempting to deploy SageMaker Endpoints using the code provided under Deploy - Amazon SageMaker from huggingface.co for these two models:

https://huggingface.co/Salesforce/codegen25-7b-multi
https://huggingface.co/openchat/opencoderplus

Error

Both endpoints consistently fail to deploy. Both fail health checks - error logs available on request as it does not appear I can attach them here.

@philschmid is there any guidance you can provide on these errors?

Currently, all models with sharded checkpoints such as these are failing to deploy, as this library is filtering out files that don't match a predefined allowlist, and the sharded format isn't included in that list.

I've made a PR that fixes this issue in #93 but until it gets merged you might be able to get by by building a custom docker image with my fork, like so:

FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04
RUN pip install --no-cache-dir \
    git+https://github.com/JimAllanson/sagemaker-huggingface-inference-toolkit@sharded-checkpoint-support