This repo provides a baseline model for our proposed task: AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant.
[Page] [Paper] [LOVEU@CVPR'22 Challenge] [CodaLab Leaderboard]
Click to know the task:
Model Architecture (see [Paper] for details):
(1) PyTorch. See https://pytorch.org/ for instruction. For example,
conda install pytorch torchvision torchtext cudatoolkit=11.3 -c pytorch
(2) PyTorch Lightning. See https://www.pytorchlightning.ai/ for instruction. For example,
pip install pytorch-lightning
Download training set and testing set (without ground-truth labels) by filling in the [AssistQ Downloading Agreement].
Then carefully set your data path in the config file ;)
Before starting, you should encode the instructional videos, scripts, QAs. See encoder.md.
Select the config file and simply train, e.g.,
CUDA_VISIBLE_DEVICES=0 python train.py --cfg configs/q2a_gru+fps1+maskx-1_vit_b16+bert_b.yaml
To inference a model, e.g.,
CUDA_VISIBLE_DEVICES=0 python inference.py --cfg configs/q2a_gru+fps1+maskx-1_vit_b16+bert_b.yaml CKPT "outputs/q2a_gru+fps1+maskx-1_vit_b16+bert_b/lightning_logs/version_0/checkpoints/epoch=5-step=155.ckpt"
The evaluation will be performed after each epoch. You can use Tensorboard, or just terminal outputs to record evaluation results.
Baseline Performance for LOVEU@CVPR2022 Challenge: 80 videos' QA samples for training, 20 videos' QA samples for testing
Model | Recall@1 ↑ | Recall@3 ↑ | MR (Mean Rank) ↓ | MRR (Mean Reciprocal Rank) ↑ |
---|---|---|---|---|
Q2A (configs/q2a_gru+fps1+maskx-1_vit_b16+bert_b.yaml) | 30.2 | 62.3 | 3.2 | 3.2 |
Feel free to contact us if you have any problems: joyachen97@gmail.com, or leave an issue in this repo.