This repo provides a code and the checkpoint that won the 2nd place for the track3 of CVPR'22 LOVEU challenge.
[Page] [Paper] [LOVEU@CVPR'22 Challenge] [CodaLab Leaderboard]
Click to know the task:
Model Architecture (see [Paper] for details):
(1) PyTorch. See https://pytorch.org/ for instruction. For example,
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
(2) PyTorch Lightning. See https://www.pytorchlightning.ai/ for instruction. For example,
pip install pytorch-lightning
Download training set and testing set (without ground-truth labels) by filling in the [AssistQ Downloading Agreement].
Then carefully set your data path in the config file ;)
Before starting, you should encode the instructional videos, scripts, QAs. See encoder.md.
Select the config file and simply train, e.g.,
CUDA_VISIBLE_DEVICES=0 python train.py --cfg configs/q2a_gru+fps1+maskx-1_vit_b16+bert_b.yaml
Our best model can be founded in best model.
To inference a model, e.g.,
CUDA_VISIBLE_DEVICES=0 python inference.py --cfg configs/q2a_gru+fps1+maskx-1_vit_b16+bert_b.yaml CKPT "best_model_path"
The evaluation will be performed after each epoch. You can use Tensorboard, or just terminal outputs to record evaluation results.
Our best model's Performance for LOVEU@CVPR2022 Challenge: 80 videos' QA samples for training, 20 videos' QA samples for testing
Model | Recall@1 ↑ | Recall@3 ↑ | MR (Mean Rank) ↓ | MRR (Mean Reciprocal Rank) ↑ |
---|---|---|---|---|
Our best model | 0.38 (2) | 0.75 (1) | 2.69 (1) | 3.11 (3) |
Feel free to contact us if you have any problems: khy0501@unist.ac.kr, or leave an issue in this repo.