/SceneSeg

Codebase for CVPR2020 A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

Primary LanguagePython

SceneSeg LGSS

Codebase for CVPR2020 A Local-to-Global Approach to Multi-modal Movie Scene Segmentation

demo image

Introduction

From a video to segemented scenes. Basically, two steps are needed including holistic features extraction and temporal scene segmentation.

A single stage temporal scene segmentation is also provided in the demo. This is going to be a easy-to-use tool for plot/story unstanding with scene as a semantic unit. Currently, it only supports image input.

😬 The scene segmentation dataset is prompted to MovieNet project with 318 movies together with a easy-to-use toolkit. It is encouraged to use in the future.

Features

  • Basic video processing tools are provided including shot detection and its parallel version.
  • Holistic semantic video feature extractors including place, audio, human, action, speech are planned to be included if you wish and leave a looking forward message in the issue. Place and audio are supported now in the pre. Full version is located at movienet-tools.
  • All-in-one scene segmentation tool with all multi-modal multi-semantic elements.

Notice

😅 Since some enthusiastic researchers are requesting the codes but we plan to organize the codebase in an easy-to-use fashion, e.g. movienet-tools, we release an on-going version here.

Installation

Please refer to INSTALL.md for installation and dataset preparation. Pretrained models and dataset are also explanined here.

Get Started

🥳 Please see GETTING_STARTED.md for the basic usage.

Citation

@inproceedings{rao2020local,
title={A Local-to-Global Approach to Multi-modal Movie Scene Segmentation},
author={Rao, Anyi and Xu, Linning and Xiong, Yu and Xu, Guodong and Huang, Qingqiu and Zhou, Bolei and Lin, Dahua},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2020}
}