RMN_MSR-VTT-Hindi-VC

Learning to Discretely Compose Reasoning Module Networks for Video Captioning (IJCAI2020)

Introduction

This code is the Pytorch implementation of RNM forked from tgc1997 and the Hindi MSR-VTT dataset is created by alokssingh. Modification in the original code are made for the compatiablity with Hindi text. This implementation of RMN is used as a baseline model.

Dependencies

Python 3.7.3 (other versions may also work)
Pytorch 1.4.0 (other versions may also work)
pickle
tqdm
h5py
matplotlib
numpy
tensorboard_logger
CUDA 10.1

Download visual features from MSR-VTT and text features from MSR-VTT-Hindi-text and put them in data folder.
Download evauation tool from caption-eval

Training command example:

python train.py --dataset=msr-vtt --model=RMN --result_dir=results/msr-vtt_model --use_lin_loss \
 --learning_rate_decay --learning_rate_decay_every=5 --learning_rate_decay_rate=3 \
 --use_loc --use_rel --use_func --use_multi_gpu --learning_rate=1e-4 --attention=gumbel \
 --hidden_size=1300 --att_size=1024 --train_batch_size=32 --test_batch_size=8

Evaluation command example:

python evaluate.py --dataset=msr-vtt --model=RMN --result_dir=results/msr-vtt_model \
 --use_loc --use_rel --use_func --hidden_size=1300 --att_size=1024 \
 --test_batch_size=2 --beam_size=2 --eval_metric=CIDEr

NOTE: For METEOR score we have used meteor_indic and indic_tokenizer for tokenization

alokssingh/RMN-MSR-VTT-Hindi-Video-captioning