TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection (AAAI 2024 Paper)
by Hao Sun* 1, Mingyao Zhou* 1, Wenjing Chen†2, Wei Xie†1
1 Central China Normal University, 2 Hubei University of Technology, * Equal Contribution, † Corresponding authors.
[Paper]
0. Clone this repo
1. Prepare datasets
QVHighlights : Download official feature files for QVHighlights dataset from Moment-DETR.
Download moment_detr_features.tar.gz (8GB), extract it under '../features' directory. You can change the data directory by modifying 'feat_root' in shell scripts under 'tr_detr/scripts/' directory.
tar -xf path/to/moment_detr_features.tar.gz
TVSum : Download feature files for TVSum dataset from UMT.
Download TVSum (69.1MB), and either extract it under '../features/tvsum/' directory or change 'feat_root' in TVSum shell files under 'tr_detr/scripts/tvsum/'.
2. Install dependencies. Python version 3.7 is required.
pip install -r requirements.txt
Requirements.txt also include other libraries. Will be cleaned up soon. For anaconda setup, please refer to the official Moment-DETR github.
Training with (only video) and (video + audio) can be executed by running the shell below:
bash tr_detr/scripts/train.sh
bash tr_detr/scripts/train_audio.sh
Best validation accuracy is yielded at the last epoch.
Once the model is trained, hl_val_submission.jsonl
and hl_test_submission.jsonl
can be yielded by running inference.sh.
bash tr_detr/scripts/inference.sh results/{direc}/model_best.ckpt 'val'
bash tr_detr/scripts/inference.sh results/{direc}/model_best.ckpt 'test'
where direc
is the path to the saved checkpoint.
For more details for submission, check standalone_eval/README.md.
Training with (only video) and (video + audio) can be executed by running the shell below:
bash tr_detr/scripts/tvsum/train_tvsum.sh
bash tr_detr/scripts/tvsum/train_tvsum_audio.sh
Best results are stored in 'results_[domain_name]/best_metric.jsonl'.
Cite TR-DETR (TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection)
If you find this repository useful, please use the following entry for citation.
@inproceedings{sun_zhou2024tr,
title={Tr-detr: Task-reciprocal transformer for joint moment retrieval and highlight detection},
author={Sun, Hao and Zhou, Mingyao and Chen, Wenjing and Xie, Wei},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={5},
pages={4998--5007},
year={2024}
}
The annotation files and many parts of the implementations are borrowed Moment-DETR and QD-DETR. Following, our codes are also under MIT license.