Video captioning

Source code for Video Captioning

Requirements

This code requires tensorflow1.1.0. The evaluation code is in Python, and you need to install coco-caption evaluation if you want to evaluate the model.

Download Dataset

MSVD
MSR-VTT

Preprocess data

1. Extract all frames from videos

It needs to extract the frames by using cpu_extract.py. Then use read_certrain_number_frame.py to uniformly sample 5 frames from all frames of a video. At last use the tf_feature_extract.py to extract the inception-resnet-v2 features of frame.

2.Evaluate models

use the *_s2vt.py. Before that, it needs to change the model path of evaluation function and some global parameters in the file. For example,

python tf_s2vt.py --gpu 0 --task evaluate

The MSVD models can be downloaded from here The MSR-VTT models can be downloaded from here