MM-Vid: Advancing Video Understanding with GPT-4V(ision)

This repository contains the open source implementation of the paper "MM-Vid: Advancing Video Understanding with GPT-4V(ision)".

Overview

The goal of this project is to advance video understanding by leveraging the capabilities of GPT-4V(ision). The implementation follows the methodologies and experiments described in the paper, providing a comprehensive framework for scene detection, video clipping, speech recognition, and generating coherent video descriptions.

Installation

To use this repository, first clone the repository and install the required dependencies.

git clone https://github.com/yongliang-wu/MM-VID.git
cd MM-VID
pip install -r requirements.txt

Then run the code

python main.py

TODO

The input of external information is not supported yet.