Learning-Actionness-From-ActionBackground-Discrimination

PyTorch implementation of the paper "Learning actionness from action/background discrimination" for action localization on the CrossTask dataset. Tested with Python 3.8.13, PyTorch 1.11.0, Numpy 1.22.4, ffmpeg-python 0.2.0 .

1. Feature Extraction

For video and text feature extraction MIL-NCE is used.

First, follow the link in MIL-NCE and download the word2vec matrix and dictionary. Arrange the --word2vec_path and --dict_path arguments in args.py accordingly.
Then, download the pretrained S3D weights "s3d_howto100m.pth" from S3D and update the --net_weights_path.
Follow the instructions given in CrossTask to download the videos. Update the --videos_path and --annotations_path.
Arrange --video_features_path and --text_features_path.
Run extract_features.py . In order to use different videos, you should update the featextract/data/video_list.csv .

This part is for replacating the action localization result given in MIL-NCE by using the previously extracted features.

Repo will be updated regularly with further implementation of the paper.