2nd winning solution for CVPR'22 LOVEU challenge:Track 3

This repo provides a code and the checkpoint that won the 2nd place for the track3 of CVPR'22 LOVEU challenge.

[Page] [Paper] [LOVEU@CVPR'22 Challenge] [CodaLab Leaderboard]

Click to know the task:

Click to see the demo

Model Architecture (see [Paper] for details):

arch

Install

(1) PyTorch. See https://pytorch.org/ for instruction. For example,

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

(2) PyTorch Lightning. See https://www.pytorchlightning.ai/ for instruction. For example,

pip install pytorch-lightning

Data

Download training set and testing set (without ground-truth labels) by filling in the [AssistQ Downloading Agreement].

Then carefully set your data path in the config file ;)

Encoding

Before starting, you should encode the instructional videos, scripts, QAs. See encoder.md.

Training & Evaluation

Select the config file and simply train, e.g.,

CUDA_VISIBLE_DEVICES=0 python train.py --cfg configs/q2a_gru+fps1+maskx-1_vit_b16+bert_b.yaml

Our best model can be founded in best model.

To inference a model, e.g.,

CUDA_VISIBLE_DEVICES=0 python inference.py --cfg configs/q2a_gru+fps1+maskx-1_vit_b16+bert_b.yaml CKPT "best_model_path"

The evaluation will be performed after each epoch. You can use Tensorboard, or just terminal outputs to record evaluation results.

Our best model's Performance for LOVEU@CVPR2022 Challenge: 80 videos' QA samples for training, 20 videos' QA samples for testing

Model Recall@1 ↑ Recall@3 ↑ MR (Mean Rank) ↓ MRR (Mean Reciprocal Rank) ↑
Our best model 0.38 (2) 0.75 (1) 2.69 (1) 3.11 (3)

image

Contact

Feel free to contact us if you have any problems: khy0501@unist.ac.kr, or leave an issue in this repo.

Thank you!