Spatio-Temporal Action Detection with Occlusion

Link: Spatio-Temporal Action Detection with Occlusion.

Overview

STADO can be decomposed into one module and three branches:

Mask-Guided Attention Module

Produces a spatial attention mask to modulate features generated by the backbone to focus on non-occlusion patterns.
Multi-Task Branches

(2.1) Center Branch for center localization and action recognition.

(2.2) Movement Branch for movement estimation at adjacent frames to form moving point trajectories.

(2.3) Box Branch for spatial extent detection by directly regressing bounding box size at the estimated center point of each frame.