Stability-AI/StableLM

Can't load model on AWS Sagemaker

DavidHaintz opened this issue · 3 comments

Hi,

when executing the model on AWS Sagemaker, I get the following error:

PredictionException: Could not load model /.sagemaker/mms/models/stabilityai__stablelm-tuned-alpha-7b with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.gpt_neox.modeling_gpt_neox.GPTNeoXForCausalLM'>)

In the notebook AutoModelForCausalLM is used too.
Maybe the used transformers version 4.26 doesn't support StableLM.

Does anyone know the needed version of transformers?
Does anyone has experience with running StableLM on AWS Sagemaker?

Code for recreating the issue:

from sagemaker.huggingface.model import HuggingFaceModel
hub = {
  'HF_MODEL_ID': 'stabilityai/stablelm-tuned-alpha-7b',
  'HF_TASK': 'text-generation'
}

huggingface_model = HuggingFaceModel(
   env=hub,
   role=role,
   transformers_version="4.26",
   pytorch_version="1.13",
   py_version='py39',
)

predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.g4dn.8xlarge"
)

prompt = f"""<|SYSTEM|># StableLM Tuned (Alpha version)
  - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
  - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
  - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
  - StableLM will refuse to participate in anything that could harm a human.

<|USER|>Can you write a song about a pirate at sea?
<|ASSISTANT|>"""

result = predictor.predict(prompt)
predictor.delete_endpoint()
print(result)

I have the same issue

I was able to deploy the model successfully. Here is the configuration that I used:

huggingface_model = HuggingFaceModel(
    model_data=s3_location,
    role=role,
    transformers_version="4.26",
    pytorch_version="1.13",
    py_version="py39"
)

This was on a ml.g5.4xlarge instance type.

The key is that the model needs the usage code snippet from here: https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b

That is, you'll need custom inference code for your model. You can follow steps similar to this blog post on how to include custom inference code in your model: https://www.philschmid.de/deploy-flan-ul2-sagemaker

Steps:

  1. Download the model files from Huggingface.
  2. Create code/inference.py with the custom inference code.
  3. Create a model.tar.gz that contains the original model files and code/inference.py
  4. Upload the model to an S3 bucket.
  5. Create an endpoint using the model from the S3 bucket.

@sajal2692 Can you please share you inference.py and requirements.txt files then?