A question with ViT 3d
Closed this issue · 0 comments
JesseZZZZZ commented
Hi, I am using ViT as a feature extractor from videos. Now I'm using 3d ViT, the codes can run pretty well, but I'm new to this field and I doesn't understand how this model handles the problem of the time between frames (delta t). Does anyone know this issue? Thx!