FPS mesurment

Hi, thanks for the amazing work!
I wanted to ask how you compute the FPS on the semi-online setup and how it depends on the stride S and clip_size T used.
Taking the T=5 & S=1 scenario (the one reported on the main results table) the model takes as input 5 frames at a time, 4 of which will be overlapping from window to window (is this correct?). This means that the effective new frames predictions from step to step is just 1 frame, as the other 4 are part of the overlap used to compute the matching.
Having this in mind how do you compute the FPS? I guess that is not computed taking just the effective 1 frame as the actual frames, as then FPS will be equally proportional to the stride for a fixed clip_size T.

Thanks a lot for your clarifications!!

Hi @acaelles97 ,

Thank you for your interests in our work.
Identity matching is a core part of tracking, which may cause a lot of computation time.
Therefore, we measure FPS by also including the cost for matching identities as follows:
(total execution time to finalize results for all videos in the test set) / (total number of frames of the test set)

Hope this answers your question.

Okay thanks a lot for your answer!
I have two final questions: Do you compute the volumetric soft-IoU matching score with the masks interpolated to the video original size or in the output model resolution (which I assume is /4 or /8 lower)? I guess also that this interpolation time to the target resolution is also included in how you compute the FPS right?

Have you tried adding more metrics to compute the matching score like for instance classification score that penalizes two instances having different classes?

Thanks a lot for your help!

Yes, both matching and interpolation time are included.
The resolution of matching clip outputs is /8, and you can check the specific details of matching outputs at projects/IFC/ifc/structures/clip_output.py

IFC/projects/IFC/ifc/ifc.py

Line 198 in fb2ee45

interim_size = (math.ceil(image_size[0] / 8), math.ceil(image_size[1] / 8))

We did have considered what you pointed out, but did not go through experiments. I believe utilizing classification score can further improve the performance if well-tuned.

Thanks for the good questions :)

Thanks a lot for your fast reply! That's all I wanted to ask :)