This repository provides the data and implementation for video summarization with LSTM, i.e. vsLSTM and dppLSTM in our paper:
Video Summarization with Long Short-term Memory
Ke Zhang*, Wei-Lun Chao*, Fei Sha, and Kristen Grauman.
In Proceedings of the European Conference on Computer Vision (ECCV), 2016, Amsterdam, The Netherlands. (*Equal contribution) [pdf] [supp]
If you find the codes or other related resources from this repository useful, please cite the following paper:
@inproceedings{zhang2016video,
title={Video summarization with long short-term memory},
author={Zhang, Ke and Chao, Wei-Lun and Sha, Fei and Grauman, Kristen},
booktitle={ECCV},
year={2016},
organization={Springer}
}
- MAC OS X or Linux
- NVIDIA GPU with compute capability 3.5+
- Python 2.7+
- Theano 0.7+
- Matlab
Download the data and unzip to ./data/
Note that we down-sampled the original video by 2fps.
- file name: in the format 'Data_$Dataset$_google_p5.h5', e.g. Data_SumMe_google_p5.h5, means the frame level feature of SumMe dataset.
- the index of videos are stored as ‘idx’ in the file, in most cases it’s from 1 to n, where n is the number of videos in the dataset (except for Youtube dataset).
- feature & ground-truth: the feature is indexed as ‘fea_i’ , the importance is indexed as ‘gt_1_i’ (real number, from the original dataset), and the keyframe we used is indexed as ‘gt_2_i’ (binary value transferred from the original dataset) for the i-th video in the dataset.
Original videos and annotations for each dataset are also available from the the authors' project page
- TVSum dataset: https://github.com/yalesong/tvsum
- SumMe dataset: https://people.ee.ethz.ch/~gyglim/vsum/#benchmark
- OVP and YouTube datasets: https://sites.google.com/site/vsummsite/
We have enclosed pre-trained models in the ./model directory download the model and run the following commands:
Download the pre-trained models and unzip it to ./models and run the following commands:
cd ./codes
THEANO_FLAGS=device=gpu0,floatX=float32 python dppLSTM_main.py
This will automatically run summarization on the video data using pre-trained model, and save the results in ./res_LSTM/ as dppLSTM_$DATASET$_2_inference.h5
If you want to train the model on your own data, just uncomment Line 85 in dppLSTM_main.py
train(model_idx = model_idx, train_set = train_set, val_set = val_set, model_saved = model_file)
For both SumMe and TVSum datasets, you can find the code for evaluation provided by the author:
We also provided the evaluation code with wrappers that help adapt to the datasets above
To run evaluation on the predicted summarization, start the matlab and run the following commands:
cd ./codes
dppLSTM_eval('../data/', '$DATASET$', '/dppLSTM_$DATASET$_2_inference.h5')