Questions about downstream tasks

Question

Questions about downstream tasks

Yipinggggg opened this issue 9 months ago · 1 comments

Hi, great work! But I have a question I don't understand.

The backbone you used for training is a timesformer which takes a sequence of frames as input, but for all the downstream tasks the input is a single frame. Maybe I haven't fully understood the code, but what does the time dimension do in downstream tasks?

Thank you very much!

Answer 1 · 2024-04-05T15:30:59.000Z

Hi @Yipinggggg
Thanks for your interest!
All of our downstream tasks take video sequences as the model input to model the temporal information.
May I learn which part of code is confusing?