Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error

Question

Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error

averypfeiffer opened this issue 5 months ago · 5 comments

When attempting to manually deploy the model to sagemaker via a deployment script or automatically deploying the model via the huggingface inference endpoints UI, I receive the same error:

"ValueError: The checkpoint you are trying to load has model type llava_llama but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date."

Answer 1 · 2024-07-17T23:51:53.000Z

unfortunatley we do not have sagemaker experts in our team. Could you check with AWS team for more details? Or share a scripts that can reproduce the error locally?

Answer 2 · 2024-07-18T16:38:51.000Z

Absolutely! I don't believe its a sagemaker issue, it seems like a lack of support for the custom config llava_llama in the transformers library.

Here is a simple script that will immediately produce the issue when trying to load the model via the hugging face transformers library:

from PIL import Image
from transformers import pipeline

vqa_pipeline = pipeline(
    "visual-question-answering", model="Efficient-Large-Model/VILA1.5-40b"
)


# load an example image
image = Image.open("./test_images/einsidtoJYc-Scene-6-01.jpg")

# example text input
text = "What is happening in this image?"

# Prepare the payload
payload = {
    "inputs": {
        "question": text,
        "image": image
    }
}

result = vqa_pipeline(image, text, top_k=1)

print(f"Question: {text}")
print(f"Answer: {result[0]['answer']}")

Answer 3 · 2024-08-01T04:07:32.000Z

i think the problem is that we haven't tested with vqa-pipeline yet. Could you check with our offical inference impl?

Answer 4 · 2024-08-03T00:37:01.000Z

Even simpler example.

from transformers import AutoConfig

model_id = "Efficient-Large-Model/VILA1.5-40b"
config = AutoConfig.from_pretrained(model_id,  trust_remote_code=True)# Error Here
print(config)

Answer 5 · 2024-08-04T00:45:08.000Z

I copied what I needed from run_vila.py and it worked.
if you do

from VILA.llava.model import *

it should fix the llava_llama issue.
It still complains about missing weights (even with use_safetensors=False) if you try AWQ versions though.