[Bug] Owlv2 Zero-shot object detection
nisyad-ms opened this issue · 0 comments
nisyad-ms commented
There seems to be a bug in the processor.post_process_object_detection()
step in the zero-shot od pipeline.
Observation: the bounding boxes are still shifted even after the post_process_object_detection
step
Expected: bounding boxes should align as shown in the example image
To reproduce using the official example from the zero-shot pipeline documentation:
from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection
import requests
import torch
checkpoint=""google/owlv2-base-patch16-ensemble"
model = AutoModelForZeroShotObjectDetection.from_pretrained(checkpoint)
processor = AutoProcessor.from_pretrained(checkpoint)
url = "https://unsplash.com/photos/oj0zeY2Ltk4/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MTR8fHBpY25pY3xlbnwwfHx8fDE2Nzc0OTE1NDk&force=true&w=640"
im = Image.open(requests.get(url, stream=True).raw)
text_queries = ["hat", "book", "sunglasses", "camera"]
inputs = processor(text=text_queries, images=im, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
target_sizes = torch.tensor([im.size[::-1]])
results = processor.post_process_object_detection(outputs, threshold=0.1, target_sizes=target_sizes)[0]
draw = ImageDraw.Draw(im)
scores = results["scores"].tolist()
labels = results["labels"].tolist()
boxes = results["boxes"].tolist()
for box, score, label in zip(boxes, scores, labels):
xmin, ymin, xmax, ymax = box
draw.rectangle((xmin, ymin, xmax, ymax), outline="red", width=1)
draw.text((xmin, ymin), f"{text_queries[label]}: {round(score,2)}", fill="white")
im