/hidden-challenges-MR

codes for Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

Primary LanguageJupyter NotebookMIT LicenseMIT

hidden-challenges-MR

Codes of our paper "Uncovering Hidden Challenges in Query-Based Video Moment Retrieval" (BMVC'20)

[Project page | arXiv | YouTube | BMVC virtual]

Dependencies

Docker (recommended)

$ docker build -t hidden-challenges-mr .

or

$ pip install -r requirements.txt

This code is tested with Python3.8

Neptune.ai (optional)

We host our experiments on neptune.ai. To run output visualization notebooks such as notebooks/report/2DTAN_ActivityNet.ipynb, get your API token from neptune.ai.

Put your API token in src/.env file as:

NEPTUNE_API_TOKEN="YOUR_TOKEN_HERE"

Data

Charades-STA

  1. Download Charades annotations and save Charades_v1_train.csv and Charades_v1_test.csv in data/raw/charades/.
  2. Download Charades-STA annotations. Only train and test annotation files are required.
├── data
│   ├── processed
│   └── raw
        └── charades
            └──Charades_v1_train.csv
            └──Charades_v1_test.csv
            └──charades_sta_train.txt
            └──charades_sta_test.txt

Then run these commands below:

$ sh run.sh
:/app# python src/data/make_dataset data/raw/charades/charades_sta_train.txt data/raw/charades/Charades_v1_train.csv
:/app# python src/data/make_dataset data/raw/charades/charades_sta_test.txt data/raw/charades/Charades_v1_test.csv

ActivityNet Captions

Download annotations here and save train.json, val_1.json and val_2.json in data/raw/activitynet/.

├── data
│   ├── processed
│   └── raw
        └── activitynet
            └──train.json
            └──val_1.json
            └──val_2.json

Test blind baselines

:/app# python src/experiments/blind_baselines.py chrades
:/app# python src/experiments/blind_baselines.py activitynet

Evaluate your model's outputs

src/toolbox provides tools for evaluation and visualization of moment retrieval. For example, evaluation on Charades-STA is done as:

from src.toolbox.data_converters import CharadesSTA2Instances
from src.toolbox.eval import evaluate, accumulate_metrics

test_data = CharadesSTA2Instances(
    pd.read_csv(f"data/processed/charades/charades_test.csv")
)
############################
## your prediction code here
## ....
############################

results = evaluate(test_data, predictions)
summary = accumulate_metrics(results)

predictions is a list of model's output. Each item should be in the format as:

(
 (video_id: str, description: str),
 List[(moment_start: float, moment_end: float, video_duration: float)],
 List[rating: float]
)
  • video_id: video ID
  • description: a query sentence.
  • moment_start: a starting point of predicted moment's location in seconds
  • moment_end: a end point of predicted moment's location in seconds
  • video_duration: the duration of a whole video in seconds.
  • rating: a score of a predicted location. A prediction with the largest rating is evaluated as top-1 prediction.

For example, an item in predictions is like:

predictions[0]

(('3MSZA', 'person turn a light on.'),
 [[0.76366093268685, 7.389522474042329, 30.96],
  [21.86557223053205, 29.71737331263709, 30.96],
  ...
  ],
 [7.252954266982226,
  4.785879048072588,
  ...])

summary is a dictionary of metrics (R@k (IoU>m)). Examples of how to use our toolbox are in src/experiments/blind_baselines.py or notebooks (e.g., notebooks/report/SCDM_CharadeSTA.ipynb).

If this work helps your research, please cite:

@inproceedings{otani2020challengesmr,
author={Mayu Otani, Yuta Nakahima, Esa Rahtu, and Janne Heikkil{\"{a}}},
title = {Uncovering Hidden Challenges in Query-Based Video Moment Retrieval},
booktitle={The British Machine Vision Conference (BMVC)},
year = {2020},
}

Project based on the cookiecutter data science project template. #cookiecutterdatascience