reconstruction-network-for-video-captioning

This project tries to implement RecNet proposed on Reconstruction Network for Video Captioning in CVPR 2018.

Requirements

  • Ubuntu 16.04
  • CUDA 9.0
  • cuDNN 7.3.1
  • Java 1.8
  • Python 2.7.12
    • PyTorch 1.0
    • Other python libraries specified in requirements.txt

How to use

Step 1. Setup python virtual environment

$ pip install virtualenv
$ virtualenv .env
$ source .env/bin/activate
(.env) $ pip install --upgrade pip
(.env) $ pip install -r requirements.txt

Step 2. Prepare Data

  1. Extract feature vectors of datasets by following instructions in here, and locate them at ~/<dataset>/features/<network>.hdf5

    e.g. InceptionV4 feature vectors of MSVD dataset will be located at ~/data/MSVD/features/InceptionV4.hdf5.

  2. Set hyperparameters in config.py and split the dataset into train / val / test dataset by running following command.

    (.env) $ python -m scripts.split
    

Step 3. Train

  1. Set hyperparameters in config.py.
  2. Run
    (.env) $ python train.py
    

Step 4. Inference

  1. Set hyperparameters in config.py.
  2. Run
    (.env) $ python run.py
    

Result

Comparison with original paper

NOTE: For now, only 2D features are used for evaluating our model (3D features are missing).

  • MSVD

    BLEU4 METEOR CIDEr ROUGE_L
    Ours (wo. reconstructor) 39.4 27.2 37.8 61.8
    Ours (global) 40.7 27.3 34.4 61.9
    Ours (local) 35.3 27.3 35.2 61.9
    Paper (global) 51.1 34.0 69.4 79.7
    Paper (local) 52.3 34.1 69.8 80.7
  • MSR-VTT

    BLEU4 METEOR CIDEr ROUGE_L
    Ours - - - -
    Paper (global) 38.3 26.2 59.1 41.7
    Paper (local) 39.1 26.6 59.3 42.7

TODO

  • Add qualitative results.
  • Add C3D feature vectors.
  • Add MSR-VTT dataset.