NielsRogge/Transformers-Tutorials

Inconsistent Bounding Box Results with OWLv2 Image-Guided Detection

Opened this issue · 0 comments

hey, i have been trying to label the data with owlv2. i have tried 2 different codes for the same task one without postprocessing which is giving good results and another with postprocessing as i want correct annotations on my original image which is (704, 576). pre processing part automatically resize the image to (960, 960). i have tried changing different threshold=0.98, nms_threshold=1.0.

I just want correct annotations on image size (704, 576).

the code without post processing (giving perfect bounding box on image size (960, 960) )

target_image = Image.open(image_path)

target_pixel_values = processor(images=target_image, return_tensors="pt").pixel_values
unnormalized_target_image = get_preprocessed_image(target_pixel_values)

with torch.no_grad():
feature_map = model.image_embedder(target_pixel_values)[0]

b, h, w, d = feature_map.shape
target_boxes = model.box_predictor(
feature_map.reshape(b, h * w, d), feature_map=feature_map
)

target_class_predictions = model.class_predictor(
feature_map.reshape(b, h * w, d),
torch.tensor(query_embedding[None, None, ...]), # [batch, queries, d]
)[0]

target_boxes = np.array(target_boxes[0].detach())
target_logits = np.array(target_class_predictions[0].detach())
top_ind = np.argmax(target_logits[:, 0], axis=0)
score = sigmoid(target_logits[top_ind, 0])
top_boxes = target_boxes[top_ind]

correct results


the code with post processing (the result is inaccurate on both image size)

import requests
from PIL import Image
import torch

from transformers import Owlv2Processor, Owlv2ForObjectDetection

processor = Owlv2Processor.from_pretrained("google/owlv2-base-patch16-ensemble")
model = Owlv2ForObjectDetection.from_pretrained("google/owlv2-base-patch16-ensemble")
source_image = Image.open('./source_image.jpg')
target_image = Image.open('./all_images/2024-08-19-114707271_1000032$1$0$21_1-00000.jpg')
inputs = processor(images=target_image, query_images=source_image, return_tensors="pt",threshold=0.98, nms_threshold=1.0)
with torch.no_grad():
outputs = model.image_guided_detection(**inputs)
target_sizes = torch.Tensor([target_image.size[::-1]])
results = processor.post_process_image_guided_detection(outputs=outputs, target_sizes=target_sizes, threshold=0.98, nms_threshold=1.0)
boxes, scores = results[i]["boxes"], results[i]["scores"]
for box, score in zip(boxes, scores):
box = [round(i, 2) for i in box.tolist()]
print(f"Detected similar object with confidence {round(score.item(), 3)} at location {box}")

wrong result