med-air/Endo-FM

Questions about downstream tasks

Yipinggggg opened this issue · 1 comments

Hi, great work! But I have a question I don't understand.

The backbone you used for training is a timesformer which takes a sequence of frames as input, but for all the downstream tasks the input is a single frame. Maybe I haven't fully understood the code, but what does the time dimension do in downstream tasks?

Thank you very much!

Hi @Yipinggggg
Thanks for your interest!
All of our downstream tasks take video sequences as the model input to model the temporal information.
May I learn which part of code is confusing?