S2VT: Sequence to Sequence: Video to Text

Note

This repository is not being actively maintained due to lack of time and interest. My sincerest apologies to the open source community for allowing this project to stagnate. I hope it was useful for some of you as a jumping-off point.

Acknowledgement

I modified the code from jazzsaxmafia, and I have fixed some problems in his code.

Requirement

Tensorflow 0.12
Keras

How to use my code

First, download MSVD dataset, and extract video features:

$ python extract_feats.py

After this operation, you should split the features into two parts:

train_features
test_features

Second, train the model:

$ CUDA_VISIBLE_DEVICES=0 ipython

When in the ipython environment, then:

>>> import model_rgb
>>> model_rgb.train()

You should change the training parameters and directory path in the model_rgb.py

Third, test the model, choose a trained model, then:

>>> import model_rgb
>>> model_rgb.test()

After testing, a text file, "S2VT_results.txt" will generated.

Last, evaluate results with COCO

We evaluate the generation results with coco-caption tools.

You can run the shell get_coco_tools.sh get download the coco tools:

$ ./get_coco_tools.sh

After this, generate the reference json file from ground truth CSV file:

$ python create_reference.py

Then, generate the results json file from S2VT_results.txt file:

$ python create_result_json.py

Finally, you can evaluate the generation results:

$ python eval.py

Results

Model	METEOR
S2VT(ICCV 2015)
-RGB(VGG)	29.2
-Optical Flow(AlexNet)	24.3
Our model
-RGB(VGG)	28.1
-Optical Flow(AlexNet)	23.3

Attention

Please feel free to ask me if you have questions.
I only commit the RGB parts of all my code, you can modify the code to use optical flow features.

Dubaozeng/S2VT