Questions about downstream tasks
Yipinggggg opened this issue · 1 comments
Yipinggggg commented
Hi, great work! But I have a question I don't understand.
The backbone you used for training is a timesformer which takes a sequence of frames as input, but for all the downstream tasks the input is a single frame. Maybe I haven't fully understood the code, but what does the time dimension do in downstream tasks?
Thank you very much!
Kyfafyd commented
Hi @Yipinggggg
Thanks for your interest!
All of our downstream tasks take video sequences as the model input to model the temporal information.
May I learn which part of code is confusing?