AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization

PyTorch Implementation of AC-SUM-GAN

From "AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization" (IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT 2020), Early Access)
Reimplemented by Minh Hoang

Dependency

Run

pip install -r requirements.txt

Data

Structured h5 files with the video features and annotations of the SumMe and TVSum datasets are available within the "data" folder. The GoogleNet features of the video frames were extracted by Ke Zhang and Wei-Lun Chao and the h5 files were obtained from Kaiyang Zhou. These files have the following structure:

/key
    /features                 2D-array with shape (n_steps, feature-dimension)
    /gtscore                  1D-array with shape (n_steps), stores ground truth improtance score (used for training, e.g. regression loss)
    /user_summary             2D-array with shape (num_users, n_frames), each row is a binary vector (used for test)
    /change_points            2D-array with shape (num_segments, 2), each row stores indices of a segment
    /n_frame_per_seg          1D-array with shape (num_segments), indicates number of frames in each segment
    /n_frames                 number of frames in original video
    /picks                    positions of subsampled frames in original video
    /n_steps                  number of subsampled frames
    /gtsummary                1D-array with shape (n_steps), ground truth summary provided by user (used for training, e.g. maximum likelihood)
    /video_name (optional)    original video name, only available for SumMe dataset

Original videos and annotations for each dataset are also available in the authors' project webpages:

TVSum dataset: https://github.com/yalesong/tvsum
SumMe dataset: https://gyglim.github.io/me/vsum/index.html#benchmark

Custom dataset generated by generate_dataset.py have the following structure:

/key
    /features                 2D-array with shape (n_steps, feature-dimension)
    /picks                    positions of subsampled frames in original video
    /n_frames                 number of frames in original video
    /fps                      fps of original video
    /change_points            2D-array with shape (num_segments, 2), each row stores indices of a segment
    /n_steps                  number of subsampled frames
    /video_name               original video name

Training

To train the model using custom video dataset, run:

python train.py [path_to_video_dataset_folder]

To train the model using TVSum or SumMe datasets and for a number of randomly created splits of the dataset (where in each split 80% of the data is used for training and 20% for testing) use the corresponding JSON file that is included in the "data/splits" directory. This file contains the 5 randomly generated splits that were utilized in our experiments.

For training the model using a single split, run:

python train.py

Please note that after each training epoch the algorithm performs an evaluation step, using the trained model to compute the importance scores for the frames of each video of the test set. These scores are then used by the provided evaluation scripts to assess the overal performance of the model (in F-Score).

Generating video

To summarize videos stored in a directory, run: