A Video-to-Text Framework

Primary LanguagePythonMIT LicenseMIT


This repository contains the official implementation to reproduce the results for our VASTA Paper accpted GCPR 2022 link.

Alt Text

Table of Contents

It contains the following sections:

  1. Data download
  2. Requirements and Setup
  3. Training the models
  4. Pre-trained checkpoints
  5. Evaluating the trained models.

To start you need to clone this repository and cd into the root directory.

git clone https://github.com/zohrehghaderi/VASTA.git

Data Download

We show results on two datasets MSVD and MSR-VTT. We provide output of our adaptive frame selection method in data\dataset_name\index_32 and ralated lables in sematics network are in data\dataset_name\tag. As well, normalized captions are data\dataset_name\file.pickle. For using this code, it is important to download videos of both dataset and put in data\dataset_name\videos. For example, MSVD dataset is following this tree:

       |--index_32    \\ output adaptive frame selection 
       |--tag          \\ extracted tag for semantics network
       |--videos        \\ video
       |-MSVD_vocab.pkl  \\ word dictionary 
       |-full_test.pickle \\ to evalute NLP Metrics on test data
       |-full_val.pickle   \\ to evalute NLP Metrics on validation data
       |-tag.npy            \\ tag dictionary
       |-test_data.pickle    \\ test video name and related caption 
       |-train_data.pickle    \\ train video name and related caption
       |-val_data.pickle       \\ val video name and related caption


To download MSVD, follow this link


To download MSR-VTT, follow this link

Requirements and Setup

To run our coda, create a conda environment with this command.

conda env update -f environment.yml -n TED
conda activate TED

This will install all dependencies described in our environment.yml file.


To download the weights of the Swin-B network, refer to Link and then put in checkpoint/swin

NLP Metrics

In this repository, pycocoevalcap is emplyed into nlp_metrics folder to evaluate validition and test data.

Training the models

We show several models in our paper (AFS-Swin-Bert-semantics, UFS-Swin-Bert-semantics, AFS-Swin-Bert, UFS-Swin-Bert) with --afs and --semantics being True or False. The latter ones are ablations. DATASET_NAME is msvd or msrvtt.

To train our best TED-VC model use this command:

 python main.py --afs=True  --dataset=DATASET_NAME --semantics=True --ckp_semantics=checkpoint/semantics_net/DATASET_NAME/semantics.ckpt 

To train our best TED-VC model which does not use Adaptive Frame Selection (AFS) use this command:

 python main.py --afs=False  --dataset=DATASET_NAME --semantics=True --ckp_semantics=checkpoint/semantics_net/DATASET_NAME/semantics.ckpt 

Pre-trained checkpoints

Additionally, you can find pre-trained checkpoints of our model here

Model Name Dataset Link
AFS-Swin-Bert-semantics MSVD link
UFS-Swin-Bert-semantics MSVD link
AFS-Swin-Bert-semantics MSRVTT link
UFS-Swin-Bert-semantics MSRVTT link

Evaluating the trained models.

To train our best TED-VC model use this command:

 python test.py --afs=True  --dataset=DATASET_NAME --semantics=True --bestmodel=LINK_BESTMODEL

for example for MSVD dataset:

python test.py --afs=True  --dataset=msvd --semantics=True --bestmodel=bestmodel/msvd/AFSSemantics.ckpt

To train our best TED-VC model which does not use Adaptive Frame Selection (AFS) use this command:

 python test.py --afs=False  --dataset=DATASET_NAME --semantics=True --bestmodel=LINK_BESTMODEL


This should produce the following results :

Model Name Dataset Bleu-4 METEOR CIDER ROUGE-L
AFS-Swin-Bert-semantics MSVD 56.14 39.09 106.3 74.47
UFS-Swin-Bert-semantics MSVD 54.30 38.18 102.7 74.28
AFS-Swin-Bert-semantics MSRVTT 43.43 30.24 55.00 62.54
UFS-Swin-Bert-semantics MSRVTT 43.51 29.75 53.59 62.27

To train our best TED-VC model which does not use Adaptive Frame Selection (AFS) and semantics network use this command:

 python test.py --afs=False  --dataset=DATASET_NAME --semantics=False --bestmodel=LINK_BESTMODEL


Note that this is a confidential code release only meant for the purpose of reviewing our submission.


This readme is inspired by https://github.com/paperswithcode/releasing-research-code/blob/master/templates/README.md.