In "divided_space_time", Why is the class token not used during temporal attention?

Question

In "divided_space_time", Why is the class token not used during temporal attention?

Backdrop9019 opened this issue 2 years ago · 1 comments

Thank you for your great paper.
I have a question while reading your paper and code.
In the paper, the formula for performing divided_space_time includes the class token during temporal attention, but in the actual code, the cls_token is excluded during temporal attention and included during spatial attention.

This question is a duplicate of the following question, but I did not get a satisfactory answer.
https://github.com/facebookresearch/TimeSformer/issues/74
Is there a reason for this? Would it be possible to obtain experimental results by including the cls_token during temporal attention?

Answer 1 · 2023-04-17T11:04:37.000Z

Hi! I have the same question.
Have you done any experiments on this problem?