This repository contains the code for Temporal Action Localization with Enhanced Instant Discriminability paper. This code is extended upon the code of TriDet.
-
Please ensure that you have installed PyTorch and CUDA. (This code requires PyTorch version >= 1.11. We use version=1.11.0 in our experiments)
-
Install the required packages by running the following command:
pip install -r requirements.txt
- Install NMS
cd ./libs/utils
python setup.py install --user
cd ../..
The pre-extracted features can be downloaded from this Link (Password: fqsy). They are extracted with window size 16 and stride 8.
Note: Due to the large number of videos, it takes about a week to use 12 V100 GPUs to extract the features. To reduce extraction time, we used the VideoMAEv2 with half-precision weights to extract some features (obtained float16 features). Please convert the features to float32 when using:
feats = np.load(feature_path).astype(np.float32)
The pre-extracted I3D features for Charades dataset can be downloaded from this Link (Password: ixuq). They are extracted with window size 16 and stride 4.
Note: We provide the json file for Charades and multithumos in the data
folder.
If you find this work helpful, please consider citing our paper
@inproceedings{shi2023tridet,
title={TriDet: Temporal Action Detection with Relative Boundary Modeling},
author={Shi, Dingfeng and Zhong, Yujie and Cao, Qiong and Ma, Lin and Li, Jia and Tao, Dacheng},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={18857--18866},
year={2023}
}
@article{shi2023temporal,
title={Temporal Action Localization with Enhanced Instant Discriminability},
author={Shi, Dingfeng and Cao, Qiong and Zhong, Yujie and An, Shan and Cheng, Jian and Zhu, Haogang and Tao, Dacheng},
journal={arXiv preprint arXiv:2309.05590},
year={2023}
}