This repository contains some helper functions for the convenient usage of VidVRD dataset. It also has a script for evaluating the VidVRD task. Please cite this paper if the dataset helps your research:
@inproceedings{shang2017video,
author={Shang, Xindi and Ren, Tongwei and Guo, Jingfan and Zhang, Hanwang and Chua, Tat-Seng},
title={Video Visual Relation Detection},
booktitle={ACM International Conference on Multimedia},
address={Mountain View, CA USA},
month={October},
year={2017}
}
{
"ILSVRC2015_train_00010001": [ # video ID
{
# a visual relation instance
"triplet": ["person", "ride", "bicycle"] # relation triplet
"score": 0.9 # confidence score
"duration": [15, 150] # starting and ending frame IDs
"sub_traj": [ # subject trajectory
[9.0, 10.0, 45.0, 20.0], # bounding box at the starting frame
... # [left, top, right, bottom]
]
"obj_traj": [ # object trajectory
[22.0, 23.0, 67.0, 111.0],
...
]
},
... # other instances predicted for this video
],
... # other videos
}
- Install the prerequisites
conda create -n vidvrd python=2.7 anaconda cmake tensorflow=1.8.0 keras tqdm ffmpeg=3.4 py-opencv
source activate vidvrd
pip install dlib==19.3.1 --isolated
- Download precomputed features, model and detected relations from here, and decompress the zipfile under
baseline
folder. - Run
python evaluate.py baseline/models/baseline_video_relations.json
to evaluate the precomputed detected relations. Since a few wrong labels in the dataset were corrected after paper submission, the result is slightly different from the one reported in the paper. Some qualitative results can be found here. - Run
python baseline.py --detect
to detect video visual relations using the precomputed model. - Run
python baseline.py --train
to train a new model by adjusting the hyperparameters in the script, based on the precomputed features.