This repository contains the official implementation to reproduce the results for our VASTA Paper accpted GCPR 2022 link.
It contains the following sections:
- Data download
- Requirements and Setup
- Training the models
- Pre-trained checkpoints
- Evaluating the trained models.
To start you need to clone this repository and cd
into the root directory.
git clone https://github.com/zohrehghaderi/VASTA.git
cd VASTA
We show results on two datasets MSVD and MSR-VTT. We provide output of our adaptive frame selection method in data\dataset_name\index_32
and ralated lables in sematics network are in data\dataset_name\tag
. As well, normalized captions are data\dataset_name\file.pickle
. For using this code, it is important to download videos of both dataset and put in data\dataset_name\videos
. For example, MSVD dataset is following this tree:
data
|--MSVD
|--index_32 \\ output adaptive frame selection
|--tag \\ extracted tag for semantics network
|--videos \\ video
|-MSVD_vocab.pkl \\ word dictionary
|-full_test.pickle \\ to evalute NLP Metrics on test data
|-full_val.pickle \\ to evalute NLP Metrics on validation data
|-tag.npy \\ tag dictionary
|-test_data.pickle \\ test video name and related caption
|-train_data.pickle \\ train video name and related caption
|-val_data.pickle \\ val video name and related caption
To download MSVD, follow this link
To download MSR-VTT, follow this link
To run our coda, create a conda environment with this command.
conda env update -f environment.yml -n TED
conda activate TED
This will install all dependencies described in our environment.yml
file.
To download the weights of the Swin-B network, refer to Link and then put in checkpoint/swin
In this repository, pycocoevalcap is emplyed into nlp_metrics
folder to evaluate validition and test data.
We show several models in our paper (AFS-Swin-Bert-semantics, UFS-Swin-Bert-semantics, AFS-Swin-Bert, UFS-Swin-Bert) with --afs and --semantics being True or False. The latter ones are ablations.
DATASET_NAME
is msvd or msrvtt.
To train our best TED-VC model use this command:
python main.py --afs=True --dataset=DATASET_NAME --semantics=True --ckp_semantics=checkpoint/semantics_net/DATASET_NAME/semantics.ckpt
To train our best TED-VC model which does not use Adaptive Frame Selection (AFS) use this command:
python main.py --afs=False --dataset=DATASET_NAME --semantics=True --ckp_semantics=checkpoint/semantics_net/DATASET_NAME/semantics.ckpt
Additionally, you can find pre-trained checkpoints of our model here
Model Name | Dataset | Link |
---|---|---|
AFS-Swin-Bert-semantics | MSVD | link |
UFS-Swin-Bert-semantics | MSVD | link |
AFS-Swin-Bert-semantics | MSRVTT | link |
UFS-Swin-Bert-semantics | MSRVTT | link |
To train our best TED-VC model use this command:
python test.py --afs=True --dataset=DATASET_NAME --semantics=True --bestmodel=LINK_BESTMODEL
for example for MSVD dataset:
python test.py --afs=True --dataset=msvd --semantics=True --bestmodel=bestmodel/msvd/AFSSemantics.ckpt
To train our best TED-VC model which does not use Adaptive Frame Selection (AFS) use this command:
python test.py --afs=False --dataset=DATASET_NAME --semantics=True --bestmodel=LINK_BESTMODEL
This should produce the following results :
Model Name | Dataset | Bleu-4 | METEOR | CIDER | ROUGE-L |
---|---|---|---|---|---|
AFS-Swin-Bert-semantics | MSVD | 56.14 | 39.09 | 106.3 | 74.47 |
UFS-Swin-Bert-semantics | MSVD | 54.30 | 38.18 | 102.7 | 74.28 |
AFS-Swin-Bert-semantics | MSRVTT | 43.43 | 30.24 | 55.00 | 62.54 |
UFS-Swin-Bert-semantics | MSRVTT | 43.51 | 29.75 | 53.59 | 62.27 |
To train our best TED-VC model which does not use Adaptive Frame Selection (AFS) and semantics network use this command:
python test.py --afs=False --dataset=DATASET_NAME --semantics=False --bestmodel=LINK_BESTMODEL
Note that this is a confidential code release only meant for the purpose of reviewing our submission.
This readme is inspired by https://github.com/paperswithcode/releasing-research-code/blob/master/templates/README.md.