NVlabs/VILA

Deployment to SageMaker and/or HuggingFace Inference Endpoints Fails With Error

averypfeiffer opened this issue · 5 comments

When attempting to manually deploy the model to sagemaker via a deployment script or automatically deploying the model via the huggingface inference endpoints UI, I receive the same error:

"ValueError: The checkpoint you are trying to load has model type llava_llama but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date."

unfortunatley we do not have sagemaker experts in our team. Could you check with AWS team for more details? Or share a scripts that can reproduce the error locally?

Absolutely! I don't believe its a sagemaker issue, it seems like a lack of support for the custom config llava_llama in the transformers library.

Here is a simple script that will immediately produce the issue when trying to load the model via the hugging face transformers library:

from PIL import Image
from transformers import pipeline

vqa_pipeline = pipeline(
    "visual-question-answering", model="Efficient-Large-Model/VILA1.5-40b"
)


# load an example image
image = Image.open("./test_images/einsidtoJYc-Scene-6-01.jpg")

# example text input
text = "What is happening in this image?"

# Prepare the payload
payload = {
    "inputs": {
        "question": text,
        "image": image
    }
}

result = vqa_pipeline(image, text, top_k=1)

print(f"Question: {text}")
print(f"Answer: {result[0]['answer']}")

i think the problem is that we haven't tested with vqa-pipeline yet. Could you check with our offical inference impl?

Even simpler example.

from transformers import AutoConfig

model_id = "Efficient-Large-Model/VILA1.5-40b"
config = AutoConfig.from_pretrained(model_id,  trust_remote_code=True)# Error Here
print(config)

I copied what I needed from run_vila.py and it worked.
if you do

from VILA.llava.model import *

it should fix the llava_llama issue.
It still complains about missing weights (even with use_safetensors=False) if you try AWQ versions though.