/video-scene-detection

Video Scene Detection Based on the Optimal Sequential Grouping algorithm

Primary LanguageJupyter NotebookMIT LicenseMIT

Video Scene Detection Based on the Optimal Sequential Grouping algorithm

Video scene detection is the task of temporally dividing a video into semantic scenes. This repository implements two video scene detection algorithms from the following papers:

To accomplish the video scene detection task the next few steps have been proposed in the papers:

  • divide a video into shots (sequences of frames from one editing cut to another)
  • extract an arbitrary set of features from each shot
  • find pairwise distances between feature vectors
  • split shots into non-intersecting groups by optimizing a distance-based cost function

The resulting groups of shots represent the desired scenes.

H_nrm detection results

To group shots optimally, one has to solve a sequential grouping task. Two different cost functions are introduced in the papers:

  • A simple one, called H_add, calculates the cost as a sum of the distances between each group's shots.
  • The more sophisticated one, called H_nrm, extends H_add by normalizing the sum of the distances within each of the groups by the size of the group. Improved solutions for both cost functions are introduced in this repository.

The repository has been created to demonstrate the application of the algorithms on synthetic data. It doesn't contain code that extracts shot features and calculates pairwise distances.