facebookresearch/co-tracker

Occluded points at t = `query_frame`

ernestchu opened this issue · 4 comments

Hi, thanks for your wonderful work. However, there seems to be a fundamental issue in your method.

The problem is that you do not overwrite the estimated tracks $\hat{P}_0$ at the query_frame with the start location $P$ , neither do you set $\hat{v}_0$ to 1 throughout the refinement.

This may results in, at the query_frame,

  • occlusions (e.g. at image borders and in complex fur texture on the bear) and
  • drifted query position (on the bottom-left corner, a point even translates over one grid point)

output_image

The expected behavior is that if we query specific points on the query_frame, these points need to be visible, by definition, and should be precisely at the coordinates we give. I doubt that direct overwriting at each refinement stage is the right way to fix it, but the problem need to be addressed in order to make this work useful for downstream tasks. (especially for extremely low-level vision ones)

Sorry for jumping in to such a detailed question. But I am dealing with some downstream applications that require a very strict (pixel-level) problem definition of point tracking. Here's the full video result.

dense_pred_track.mp4

Hi @ernestchu, thank you for the question!

This is indeed a problem that I was planning to fix. It can be addressed with a simple override of predicted coordinates for the queried frame. This is because, during training, the model tries to predict coordinates and visibilities for all frames, including the queried one. This happens after we sample features using the correct queried coordinates, which we use for initialization.
Inference is currently performed in exactly the same way as training, which is not appropriate in this case. However, it doesn't affect the performance for all the other frames, except for the queried one.

So you were saying that the predictions on all frames actually correspond to the correct query locations. Only that the prediction error on the first frame makes it seems like the model queried the wrong points.

Yes, this is correct 🙂

Cool. Looking forward to it!