AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization
- From "AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization" (IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT 2020), Early Access)
- Reimplemented by Minh Hoang
Run
pip install -r requirements.txt
Structured h5 files with the video features and annotations of the SumMe and TVSum datasets are available within the "data" folder. The GoogleNet features of the video frames were extracted by Ke Zhang and Wei-Lun Chao and the h5 files were obtained from Kaiyang Zhou. These files have the following structure:
/key /features 2D-array with shape (n_steps, feature-dimension) /gtscore 1D-array with shape (n_steps), stores ground truth improtance score (used for training, e.g. regression loss) /user_summary 2D-array with shape (num_users, n_frames), each row is a binary vector (used for test) /change_points 2D-array with shape (num_segments, 2), each row stores indices of a segment /n_frame_per_seg 1D-array with shape (num_segments), indicates number of frames in each segment /n_frames number of frames in original video /picks positions of subsampled frames in original video /n_steps number of subsampled frames /gtsummary 1D-array with shape (n_steps), ground truth summary provided by user (used for training, e.g. maximum likelihood) /video_name (optional) original video name, only available for SumMe dataset
Original videos and annotations for each dataset are also available in the authors' project webpages:
- TVSum dataset: https://github.com/yalesong/tvsum
- SumMe dataset: https://gyglim.github.io/me/vsum/index.html#benchmark
Custom dataset generated by generate_dataset.py
have the following structure:
/key /features 2D-array with shape (n_steps, feature-dimension) /picks positions of subsampled frames in original video /n_frames number of frames in original video /fps fps of original video /change_points 2D-array with shape (num_segments, 2), each row stores indices of a segment /n_steps number of subsampled frames /video_name original video name
To train the model using custom video dataset, run:
python train.py [path_to_video_dataset_folder]
To train the model using TVSum or SumMe datasets and for a number of randomly created splits of the dataset (where in each split 80% of the data is used for training and 20% for testing) use the corresponding JSON file that is included in the "data/splits" directory. This file contains the 5 randomly generated splits that were utilized in our experiments.
For training the model using a single split, run:
python train.py
Please note that after each training epoch the algorithm performs an evaluation step, using the trained model to compute the importance scores for the frames of each video of the test set. These scores are then used by the provided evaluation scripts to assess the overal performance of the model (in F-Score).
To summarize videos stored in a directory, run:
python test.py [path_to_video_dataset_folder]