xl-sr/THOR

About VOT setup protocol

MARMOTatZJU opened this issue · 2 comments

I've noticed that in THOR/benchmark/vot.py, Line 28, you pass frame number into the initilaization function as follows:

state = tracker.setup(im, target_pos, target_sz, f)

Then in THOR/trackers/THOR_modules/wrapper.py, Line 56-63, you use the information whether f is 0 ("not f" in code), which indicates if the given frame is the first frame of a video(f=0) or the reinitialization frame after drift(f!=0), to determine whether the long term module(as well as its list of LT templates) should be cleaned&reinitialized or just updated(with the template information before drift).

        if not f or self._cfg.vanilla:
            self.lt_module = LT_Module(K=self._cfg.K_lt, template_keys=self.template_keys,
                                       lb=self._cfg.lb, lb_type=self._cfg.lb_type,
                                       verbose=self._cfg.verbose, viz=self._cfg.viz)
            self.lt_module.fill(temp)
        else:
            # reinitialize long term only at the beginning of the episode
            self.lt_module.update(temp, div_scale=0)

In my opinion, the latter would generate a data leak since trackers should not have the such information (f=0 or not) during (re)initialization phase according to VOT protocol.

Wish to hear your explanation. Thanks in advance.

xl-sr commented

hey, thanks for your question.

The VOT protocol states: "Whenever a tracker predicts a bounding box with zero overlap with the ground truth, a failure is detected and the tracker is re-initialized five frames after the failure."
This means for a single-template based tracker the template is switched out for a completely new one.

Completely resetting the LT-module after a drift, obviously defeats the goal of accumulating templates over the long term. Therefore the tracker needs to have the information if it is the beginning or the middle of the sequence. A full reset makes only sense at the beginning of the sequence. If anything, this makes the task harder, since instead of getting a completely new (ground truth) template, the LT-module might have picked up bad templates before drifting.

However, officially f is not provided to the init function of tracker. (c.f. interface in official toolkit). Thus although this has not been stated explicitly in the text, f is not accessible for trackers, and every initialization should be the same seen by trackers.