This repo holds the code for the work presented on ACM Multimedia 2020 [Paper]
We provide the implementation in PyTorch for the ease of use.
Install the requirements by runing the following command:
pip install -r requirements.txt
We highly appreciate @YapengTian for the shared features and code.
Two kinds of features (i.e., Visual features and Audio features) are required for experiments.
-
Visual Features: You can download the VGG visual features from here.
-
Audio Features: You can download the VGG-like audio features from here.
-
Additional Features: You can download the features of background videos here, which are required for the experiments of the weakly-supervised setting.
After downloading the features, please place them into the data
folder. The structure of the data
folder is shown as follows:
data
├── audio_feature.h5
├── audio_feature_noisy.h5
├── labels.h5
├── labels_noisy.h5
├── mil_labels.h5
├── test_order.h5
├── train_order.h5
├── val_order.h5
├── visual_feature.h5
└── visual_feature_noisy.h5
You can download the AVE dataset from the repo here.
You can run the following command for training and testing the model.
We evaluate the model on the test set every epoch (set by the arg "eval_freq"
in the configs/default_config.yaml
file) when training.
bash supv_train.sh
# The argument "--snapshot_pref" denotes the path for saving checkpoints and code.
Evaluating
bash supv_test.sh
After training, there will be a checkpoint file whose name contains the accuracy on the test set and the number of epoch.
Similar to training the model in a fully-supervised setting, you can run training and testing using the following commands:
Training
bash weak_train.sh
Evaluating
bash weak_test.sh
Please cite the following paper if you feel this repo useful to your research
@inproceedings{CMRAN2020Xu,
author = {Haoming Xu and
Runhao Zeng and
Qingyao Wu and
Mingkui Tan and
Chuang Gan},
title = {Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization},
booktitle = {{ACM} International Conference on Multimedia},
year = {2020},
}