hkchengrex/XMem

Initialization during inference

MaxEAB opened this issue · 6 comments

Hi, Is it possible to initialize tracking using a mask and frame (or a sequence of frames) that is saved in advance? This is for single object tracking when the object to be tracked is known in advance.

Yes. See https://github.com/hkchengrex/XMem/blob/main/docs/INFERENCE.md#on-custom-data

Feel free to comment if you run into problems.

Hi, Thanks for the reply. I meant a scenario in which the initial reference frame is known/saved in advance, but the tracking will happen live using the saved reference images.

Basically, the user will not actively initiate the process.

I am not sure if I understand. Where do the images to be segmented come from? What do you mean by live?

If the user is not initiating... who is?

Suppose we are interested in tracking a specific type of object, say a certain make/model of a car, which is known in advance. And we want the same car to be identified (semantically segmented) by a street camera, without anyone actively initializing the first frame.

Is it possible to save a few reference images and masks beforehand so that the network can in turn use those to initialize the inference, instead of the user choosing one? (This means the network being able to handle rapid change of scene/background between the reference frame and the current frame.)

I see. Thank you for the explanation.
In that case, I would still treat the input as a "video", except that the first frame is quite different than the second. I would also reset the sensory memory at the second frame due to this difference. For multiple objects, we can save all of them to the working memory (and force them to remain in the working memory at all times). Some inference logic would have to be rewritten.
Whether it would work well depends largely on how different the reference and the query are.