This repo holds the codes of paper: "TVNet: Temporal Voting Network for Action Localization".
Temporal action localization is a vital task in video understranding. In this paper, we propose a Temporal Voting Network (TVNet) for action localization in untrimmed videos. This incorporates a novel Voting Evidence Module to locate temporal boundaries, more accurately, where temporal contextual evidence is accumulated to predict frame-level probabilities of start and end action boundaries.
- Python == 2.7
- Tensorflow == 1.9.0
- CUDA==10.1.105
- GCC >= 5.4
Note that the PEM code from BMN is implemented in Pytorch==1.1.0 or 1.3.0
Our experiments is based on ActivityNet 1.3 and THUMOS14 datasets.
You can download the feature on THUMOS14 at here GooogleDrive.
Place it into a folder named thumos_features inside ./data.
You also need to download the feature for PEM (from BMN) at GooogleDrive. Please put it into a folder named Thumos_feature_hdf5 inside ./TVNet-THUMOS14/data/thumos_features.
If everything goes well, you can get the folder architecture of ./TVNet-THUMOS14/data like this:
data
└── thumos_features
├── Thumos_feature_dim_400
├── Thumos_feature_hdf5
├── features_train.npy
└── features_test.npy
You can download the feature on ActivityNet 1.3 at here GoogleCloud. Please put csv_mean_100 directory into ./TVNet-ANET/data/activitynet_feature_cuhk/.
If everything goes well, you can get the folder architecture of ./TVNet-ANET/data like this:
data
└── activitynet_feature_cuhk
└── csv_mean_100
cd TVNet-THUMOS14
Run the following script with all steps on THUMOS14:
bash do_all.sh
Note: If you use BlueCrystal 4, you can directly run the following script without any dependencies setup.
bash do_all_BC4.sh
cd TVNet-ANET
bash do_all.sh or bash do_all_BC4.sh
Take TVNet-THUMOS14 as an example:
cd TVNet-THUMOS14
python TEM_train.py
python TEM_test.py
python VEM_create_windows.py --window_length L --window_stride S
L is the window length and S is the sliding stride. We generate training windows for length 10 with stride 5, and length 5 with stride 2.
python VEM_train.py --voting_type TYPE --window_length L --window_stride S
python VEM_test.py --voting_type TYPE --window_length L --window_stride S
TYPE should be start or end. We train and test models with window length 10 (stride 5) and window length 5 (stride 2) for start and end separately.
python PEM_train.py
python proposal_generation.py
python post_postprocess.py
tIoU | mAP@IoU |
---|---|
0.3 | 0.6472508113042471 |
0.4 | 0.5798786190604537 |
0.5 | 0.4929379406719832 |
0.6 | 0.38209885650455544 |
0.7 | 0.26475685440888874 |
tIoU | mAP@IoU |
---|---|
Average | 0.3460396513933088 |
0.5 | 0.5135151163296395 |
0.75 | 0.34955648726767025 |
0.95 | 0.10121803584836778 |
This implementation borrows from:
BSN: BSN-Boundary-Sensitive-Network
TEM_train/test.py -- for the TEM module we used in our paper
load_dataset.py -- borrow the part which load data for TEM
BMN: BMN-Boundary-Matching-Network
PEM_train.py -- for the PEM module we used in our paper
G-TAD: Sub-Graph Localization for Temporal Action Detection
post_postprocess.py -- for the multicore process to generate detection
Our main contribution is in:
VEM_create_windows.py -- generate training annotations for Voting Evidence Module (VEM)
VEM_train.py -- train Voting Evidence Module (VEM)
VEM_test.py -- test Voting Evidence Module (VEM)