/video2text.pytorch

PyTorch implementation of video captioning

Primary LanguageOpenEdge ABL

PyTorch implementation of video captioning

Requirements

Pretrained Model

Datasets

Obtain the dataset you need:

Packages

torch, torchvision, numpy, scikit-image, nltk, h5py, pandas, future  # python2 only
tensorboard_logger  # for use tensorboard to view training loss

You can use:

sudo pip install -r requirements.txt

To install all the above packages.

Usage

Preparing Data

Firstly, we should make soft links to the dataset folder and pretrained models. For example:

mkdir datasets
ln -s YOUR_DATASET_PATH datasets/MSVD
mkdir models
ln -s YOUR_CNN_MODEL_PATH models/

Some details can be found in opts.py. Then we can:

  1. Prepare video feature:
python scripits/prepro_video_feats.py
  1. Prepare caption feature and dataset split:
python scripts/prepro_caption_feats.py

Training and Testing

Before training the model, please make sure you can use GPU to accelerate computation in PyTorch. Some parameters, such as batch size and learning rate, can be found in args.py.

  • Train:
python train.py
  • Evaluate:
python evaluate.py
  • Sample some examples:
python sample.py

Related papers

1.Supervising Neural Attention Models for Video Captioning by Human Gaze Data