/3d-attention-video-understanding

Using a 3D Nearby Self-Attention Transformer to leverage the spatiotemporal nature of video for representation learning.

Primary LanguagePython

No issues in this repository yet.