/video_to_text

a framework for video to text

Primary LanguageOpenEdge ABLMIT LicenseMIT

Video captioning

Source code for Video Captioning

Requirements

Download Dataset

Preprocess data

1. Extract all frames from videos

It needs to extract the frames by using cpu_extract.py. Then use read_certrain_number_frame.py to uniformly sample 5 frames from all frames of a video. At last use the tf_feature_extract.py to extract the inception-resnet-v2 features of frame.

2.Evaluate models

use the *_s2vt.py. Before that, it needs to change the model path of evaluation function and some global parameters in the file. For example,

python tf_s2vt.py --gpu 0 --task evaluate

The MSVD models can be downloaded from here The MSR-VTT models can be downloaded from here

These processes are a little complicated, please feel free to ask me if you have some questions.