facebookresearch/chameleon

Unknown Image Format Error with Multimodal Input Inference

iz2late opened this issue · 1 comments

I'm using the example code for multimodal input inference, but I'm encountering an "Unknown image format" error regardless of the image format I provide. I've tried PNG, JPG, and JPEG formats without success.

Has anyone else experienced this issue, or does anyone have suggestions on how to resolve it?

from chameleon.inference.chameleon import ChameleonInferenceModel


def main():
    model = ChameleonInferenceModel(
        "./data/models/7b/",
        "./data/tokenizer/text_tokenizer.json",
        "./data/tokenizer/vqgan.yaml",
        "./data/tokenizer/vqgan.ckpt",
    )

    tokens = model.generate(
        prompt_ui=[
            {"type": "image", "value": "test_image.jpeg"},
            {"type": "text", "value": "What do you see?"},
            {"type": "sentinel", "value": "<END-OF-TURN>"},
        ]
    )
    print(model.decode_text(tokens)[0])


if __name__ == "__main__":
    main()

Oh I found that the value shoud start with "file:"!