Visual-Behavior/detr-tensorflow

We expect only one image here for now ...

Opened this issue · 2 comments

Does the dataset only support batch size 1, any plan to fix this?

def retrieve_outputs(augmented_images, augmented_bbox):

outputs_dict = {}
image_shape = None


# We expect only one image here for now
image = augmented_images[0].astype(np.float32)
augmented_bbox = augmented_bbox[0]

bbox, t_class = imgaug_bbox_to_xcyc_wh(augmented_bbox, image.shape[0], image.shape[1])

bbox = np.array(bbox)
t_class = np.array(t_class)

return image, bbox, t_class

This comment is kind of misleading. Internally we're working with sequential image data (batch, sequence_size, h, w, 3). This comment just means that the sequence is expected to be one, but it does not make sense here because this repository is not meant to handle sequences.

So the dataset support batch size > 1

Thank you for the explanation. The code is very well written. How much accuracy can you get using train_coco.py ?

I have tested your pre-trained model and it reaches over 42map. But according to the statement, the default model is not trained using the provided code.