2nd winning solution for CVPR'22 LOVEU challenge:Track 3

This repo provides a code and the checkpoint that won the 2nd place for the track3 of CVPR'22 LOVEU challenge.

Click to know the task:

Model Architecture (see [Paper] for details):

Install

(1) PyTorch. See https://pytorch.org/ for instruction. For example,

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

(2) PyTorch Lightning. See https://www.pytorchlightning.ai/ for instruction. For example,

pip install pytorch-lightning

Download training set and testing set (without ground-truth labels) by filling in the [AssistQ Downloading Agreement].

Then carefully set your data path in the config file ;)

Before starting, you should encode the instructional videos, scripts, QAs. See encoder.md.

Select the config file and simply train, e.g.,

CUDA_VISIBLE_DEVICES=0 python train.py --cfg configs/q2a_gru+fps1+maskx-1_vit_b16+bert_b.yaml

Our best model can be founded in best model.

To inference a model, e.g.,

CUDA_VISIBLE_DEVICES=0 python inference.py --cfg configs/q2a_gru+fps1+maskx-1_vit_b16+bert_b.yaml CKPT "best_model_path"

The evaluation will be performed after each epoch. You can use Tensorboard, or just terminal outputs to record evaluation results.

Model	Recall@1 ↑	Recall@3 ↑	MR (Mean Rank) ↓	MRR (Mean Reciprocal Rank) ↑
Our best model	0.38 (2)	0.75 (1)	2.69 (1)	3.11 (3)

Feel free to contact us if you have any problems: khy0501@unist.ac.kr, or leave an issue in this repo.