S2VT: Sequence to Sequence: Video to Text
Note
This repository is not being actively maintained due to lack of time and interest. My sincerest apologies to the open source community for allowing this project to stagnate. I hope it was useful for some of you as a jumping-off point.
Acknowledgement
I modified the code from jazzsaxmafia, and I have fixed some problems in his code.
Requirement
- Tensorflow 0.12
- Keras
How to use my code
First, download MSVD dataset, and extract video features:
$ python extract_feats.py
After this operation, you should split the features into two parts:
train_features
test_features
Second, train the model:
$ CUDA_VISIBLE_DEVICES=0 ipython
When in the ipython environment, then:
>>> import model_rgb
>>> model_rgb.train()
You should change the training parameters and directory path in the model_rgb.py
Third, test the model, choose a trained model, then:
>>> import model_rgb
>>> model_rgb.test()
After testing, a text file, "S2VT_results.txt" will generated.
Last, evaluate results with COCO
We evaluate the generation results with coco-caption tools.
You can run the shell get_coco_tools.sh
get download the coco tools:
$ ./get_coco_tools.sh
After this, generate the reference json file from ground truth CSV file:
$ python create_reference.py
Then, generate the results json file from S2VT_results.txt
file:
$ python create_result_json.py
Finally, you can evaluate the generation results:
$ python eval.py
Results
Model | METEOR |
---|---|
S2VT(ICCV 2015) | |
-RGB(VGG) | 29.2 |
-Optical Flow(AlexNet) | 24.3 |
Our model | |
-RGB(VGG) | 28.1 |
-Optical Flow(AlexNet) | 23.3 |
Attention
- Please feel free to ask me if you have questions.
- I only commit the RGB parts of all my code, you can modify the code to use optical flow features.