A question with ViT 3d

Question

A question with ViT 3d

Closed this issue 3 months ago · 0 comments

Hi, I am using ViT as a feature extractor from videos. Now I'm using 3d ViT, the codes can run pretty well, but I'm new to this field and I doesn't understand how this model handles the problem of the time between frames (delta t). Does anyone know this issue? Thx!