/starlab_SQuiDNet

Primary LanguagePythonMIT LicenseMIT

SQuiDNet: Selective Query-guided Debiasing Network for Video Corpus Moment Retrieval

We are providing the Code for ECCV 2022 paper "Selective Query-guided Debiasing for Video Corpus Moment Retrieval"

Author: "Sunjae Yoon, Ji Woo Hong, Eunseop Yoon, Dahyun Kim, Junyoeng Kim, Hee Suk Yoon and Chang D. Yoo

Task: Video Corpus Moment Retrieval

Video moment retrieval (VMR) aims to localize target moments in untrimmed videos pertinent to given textual query. Existing retrieval systems tend to rely on retrieval bias as a shortcut and thus, fail to sufficiently learn multi-modal interactions between query and video.

SQuiDNet is proposed to debiasing in video moment retrieval via conjugating retrieval bias in either positive or negative way.

Intro

SQuiDNet Overview

SQuiDNet is composed of 3 modules: (a) BMR which reveals biased retrieval, (b) NMR which performs accurate retrieval, (c) SQuiD which removes bad biases from accurate retrieval of NMR subject to the meaning of query.

Model

Implementation

Our results and further studies will also be updated soon!

  1. Clone the repositery
git clone https://github.com/dbstjswo505/SQuiDNet.git
cd SQuiDNet
  1. Prepare the environment
conda env create -f squid.yml
conda activate squid
  1. Input Features Download

Download tvr_feature_dataset, which should be located in the main folder SQuiDNet with the directory like below:

data
├── bmr
│   ├── bmr_prd_test_public_tvr
│   ├── bmr_prd_train_tvr
│   └── bmr_prd_val_tvr
├── sub_query_feature
│   ├── roberta_query
│   └── roberta_sub
├── video_feature
│   └── resnet_slowfast_1.5
├── text_data_ref
└── coocurrence_table

It is also available to download visual features (ResNet, SlowFast) obtained from HERO authors and text features (subtitle and query, from fine-tuned RoBERTa) obtained from XML authors. Feature extraction is available via understanding and running the code details: visual feature extraction, text feature extraction. The noun and predicate for coocurrence table are extracted using the code: noun and predicate extraction.

  1. SQuiDNet Training
bash scripts/train.sh

train.sh is performed with our defined hyperparameters, see the details in the code and is possible to modified experiement for more better performances including hyperparameter tunning.

  1. SQuiDNet Inference.
bash scripts/inference.sh

inference.sh is also performed with our defined hyperparameters and also hold the details in the code. Current settings are fixed on all the tasks including VCMR, SVMR and VR.

  1. Build Coocurrence Table
python mk_table.py

To design own your coocurrence table, you can adjust the 'cctable.json' file by running the code ./data/coocurrence_table/mk_table.py, which updates the cctable.json file.

Acknowledgement

This code is implemented on top of following contributions: TVRetrieval, HERO, HuggingFace, Info-ground, NetVLAD VLANet CONQUER MMT, MME. We thank the authors for open-sourcing these great projects and papers!

This work was partly supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (No. 2021-0-01381, Development of Causal AI through Video Understanding) and partly supported by Institute of Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (No. 2022-0-00184, Development and Study of AITechnologies to Inexpensively Conform to Evolving Policy on Ethics)

Citation

If you find this code useful for your research, please cite our paper:

@inproceedings{yoon2022selective,
  title={Selective Query-Guided Debiasing for Video Corpus Moment Retrieval},
  author={Yoon, Sunjae and Hong, Ji Woo and Yoon, Eunseop and Kim, Dahyun and Kim, Junyeong and Yoon, Hee Suk and Yoo, Chang D},
  booktitle={European Conference on Computer Vision},
  pages={185--200},
  year={2022},
  organization={Springer}
}