/MMVideoTextRetrieval

MMVideoTextRetrieval is an open source video-text retrieval toolbox based on PyTorch.

Primary LanguagePythonApache License 2.0Apache-2.0

MMVideoTextRetrieval

MMVideoTextRetrieval is an open source video-text retrieval toolbox based on PyTorch.

Introduction

This repository provides different video text retrieval methods.

Major Features

  • Modular design

    We decompose the video-text retrieval framework into different components which can be easily used any combination.

  • Support for various datasets and features

    The toolbox supports multiple datasets, such as MSRVTT, ActivityNet, LSMDC. Besides, various extracted features are provided.

  • Support for multiple video text retrieval frameworks

    MMVideoTextRetrieval implements popular frameworks for video text retrieval, such as MMT, etc. More frameworks will be added later.

  • Visual demo

    We provide the demo to visualize the results of video text retrieval models.

Demo

We provide a way to produce text-to-video retrieval in real-world applications. Before retrieval, the multi-model features of videos should be extracted and stored. The searched text is defined in the "main_train" function in demo.py, and the config "--sentence" should be used to activate the retrieval process. The outputs of the retrieval are the name of video feature files of the top 10 similar videos.

Benchmark

Model Dataset Video Feature Text Feature Pretrained Text-to-Video Retrieval Video-to-Text Retrieval
R@1 R@5 R@10 R@1 R@5 R@10
MMT MSTVTT-1kA S3D Bert no 24.6 54 67.1 24.4 56 67.8
MMT ActivityNet S3D Bert no 22.7 54.2 93.2 22.9 54.8 93.1
MMT LSMDC S3D Bert no 13.2 29.2 38.8 12.1 29.3 37.9
MMT MSTVTT-1kA&B S3D Bert HowTo100M 26.6 57.1 69.6 27 57.5 69.7
MMT ActivityNet S3D Bert HowTo100M 28.7 61.4 94.5 28.9 61.1 94.3
MMT LSMDC S3D Bert HowTo100M 12.9 29.9 40.1 12.3 28.6 38.9
HGR MSTVTT-Full Resnet152 Word2Vec no 9.2 26.2 36.5 15 36.7 48.8

(All the results are excerpted from the original paper and will be replaced by the results of pre-trained models later.)

Model Zoo

supported methods for Video Text retrieval.

  • MMT (ECCV'2020)

  • MMT-modified (ICMEW'2021)

  • HGR (CVPR'2020)

Dataset

supported datasets.

(click to collapse)

Get stated

Requirements

  • Python 3.7
  • Pytorch 1.4.0 +
  • Transformers 3.1.0
  • Numpy 1.18.1
pip install -r requirements.txt

Training

Training + evaluation:

python -m demo --config configs/$model_name/$dataset_$split_trainval.json

Evaluation from checkpoint:

python -m demo --config configs/$model_name/$dataset_$split_trainval.json --only_eval --load_checkpoint $checkpoint_path

Training from pretrained model:

python -m demo --config configs/$model_name/prtrn_$dataset_$split_trainval.json --load_checkpoint $checkpoint_path

Retrieval videos with a specific sentence:

python -m demo --config configs/$model_name/$dataset_$split_trainval.json --only_eval --load_checkpoint $checkpoint_path --sentence

Using the modified version of MMT for training:

python -m demo --config configs/$model_name/prtrn_$dataset_$split_trainval.json --modified_model