NII-Satoh-Lab/MOT_FCG

question about stage 0

AhmedAtefAhmedAly opened this issue · 1 comments

Hello,
I have a question about extracting the data.

In your paper, you mentioned you can use a given object detector (yolox) and a feature extractor (SBS).

How do you use SBS to extract the features of the objects in the video? and what kind of input is the model expecting for the features?

many thanks

Basically we detect all the object of interest in the scene, we generate crops of the image corresponding to the detected bounding boxes (containing each person we detected), and then input these crops into SBS, which is already trained on other data, to extract the appearance features.