Temporal relationships?
blackight opened this issue · 1 comments
The released code uses the temporal Transformer, but temporal attention treats each frame equally. It seems that no use of tricks like TimeEmbedding in different frames. Does this mean that the network cannot distinguish the temporal relationships of different frames.
The released code uses the temporal Transformer, but temporal attention treats each frame equally. It seems that no use of tricks like TimeEmbedding in different frames. Does this mean that the network cannot distinguish the temporal relationships of different frames.
Hi, thank you for your suggestion. We did not include position coding in our experiment to train on 16 / 32 frames and can be easily extended to other temporal lengths. But adding temporal coding may be better for fixed temporal length.