vikhyat/moondream

Example of object detection in docs

Opened this issue · 2 comments

It would be great to have a simple example of object detection in the documentation so I can reproduce it. I currently tried to use the detect method, but I'm likely using the wrong model version.

from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image

model_id = "vikhyatk/moondream2"
revision = "2024-07-23"
model = AutoModelForCausalLM.from_pretrained(
    model_id, trust_remote_code=True, revision=revision
)
tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)

I load input_image with PIL. Then...

enc_image = model.encode_image(input_image)
generated_boxes = model.detect(enc_image, question, tokenizer)

returns:

AttributeError: 'Moondream' object has no attribute 'detect'

It's not available in the latest version on Hugging Face yet, but I will share an example of how to use it from the client library shortly.

Hi! Any updates on this? Was wondering if moondream supports openset detections?