Implementation for the paper "PDAN: Pyramid Dilated Attention Network for Action Detection".
The code is tested in Python3.7 + PyTorch1.2 environment with one Tesla V100 GPU and the overall code framework is adapted from the Superevent.
The evaluation code and pre-trained model for PDAN on Toyota Smarthome Untrimmed (TSU) dataset can be found in this repository. With the pretrained model, the mAP should be more than 32.7 % for f-map CS protocol.
For training and testing this code on Charades, please download the dataset from this link and follow this repository to extract the snippet-level I3D feature. The RGB-Pretrained PDAN can be downloaded via this Link. If the I3D feature is well extracted, the pretrained RGB model should achieve ~ 23.8% per frame-mAP on Charades. Note that this mAP is the one reported in the paper which is computed by all the timesteps and not the weighted mAP. While using the original setting (25 sampled version, w/o weighted) for Charades localization, the mAP should be more than 24 %.
If you find this work useful for your research, please cite our paper:
@inproceedings{dai2021pdan,
title={Pdan: Pyramid dilated attention network for action detection},
author={Dai, Rui and Das, Srijan and Minciullo, Luca and Garattoni, Lorenzo and Francesca, Gianpiero and Bremond, Francois},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={2970--2979},
year={2021}
}
Contact: rui.dai@inria.fr