Unknown Image Format Error with Multimodal Input Inference
iz2late opened this issue · 1 comments
iz2late commented
I'm using the example code for multimodal input inference, but I'm encountering an "Unknown image format" error regardless of the image format I provide. I've tried PNG, JPG, and JPEG formats without success.
Has anyone else experienced this issue, or does anyone have suggestions on how to resolve it?
from chameleon.inference.chameleon import ChameleonInferenceModel
def main():
model = ChameleonInferenceModel(
"./data/models/7b/",
"./data/tokenizer/text_tokenizer.json",
"./data/tokenizer/vqgan.yaml",
"./data/tokenizer/vqgan.ckpt",
)
tokens = model.generate(
prompt_ui=[
{"type": "image", "value": "test_image.jpeg"},
{"type": "text", "value": "What do you see?"},
{"type": "sentinel", "value": "<END-OF-TURN>"},
]
)
print(model.decode_text(tokens)[0])
if __name__ == "__main__":
main()
iz2late commented
Oh I found that the value shoud start with "file:"!