Long context video module only
Opened this issue · 0 comments
MH-Python commented
Great works and research.
My question is simply if is it possible to use only the visual/video part (already pretrained on video dataset like kinetics) for fine-tuning on long video dataset e.g. to classify 1-minute or 2-minutes of video data.