/RMN-MSR-VTT-Hindi-Video-captioning

IJCAI2020: Learning to Discretely Compose Reasoning Module Networks for Video Captioning's performance on Hindi descriptions generation

Primary LanguagePython

RMN_MSR-VTT-Hindi-VC

Learning to Discretely Compose Reasoning Module Networks for Video Captioning (IJCAI2020)

Introduction

This code is the Pytorch implementation of RNM forked from tgc1997 and the Hindi MSR-VTT dataset is created by alokssingh. Modification in the original code are made for the compatiablity with Hindi text. This implementation of RMN is used as a baseline model.

Dependencies

  • Python 3.7.3 (other versions may also work)
  • Pytorch 1.4.0 (other versions may also work)
  • pickle
  • tqdm
  • h5py
  • matplotlib
  • numpy
  • tensorboard_logger
  • CUDA 10.1
  1. Download visual features from MSR-VTT and text features from MSR-VTT-Hindi-text and put them in data folder.
  2. Download evauation tool from caption-eval

Training command example:

python train.py --dataset=msr-vtt --model=RMN --result_dir=results/msr-vtt_model --use_lin_loss \
 --learning_rate_decay --learning_rate_decay_every=5 --learning_rate_decay_rate=3 \
 --use_loc --use_rel --use_func --use_multi_gpu --learning_rate=1e-4 --attention=gumbel \
 --hidden_size=1300 --att_size=1024 --train_batch_size=32 --test_batch_size=8

Evaluation command example:

python evaluate.py --dataset=msr-vtt --model=RMN --result_dir=results/msr-vtt_model \
 --use_loc --use_rel --use_func --hidden_size=1300 --att_size=1024 \
 --test_batch_size=2 --beam_size=2 --eval_metric=CIDEr

NOTE: For METEOR score we have used meteor_indic and indic_tokenizer for tokenization

Acknowledgements

  1. Learning to Discretely Compose Reasoning Module Networks for Video Captioning
  2. Attention based video captioning framework for Hindi
  3. alokssingh
  4. tgc1997