This repository contains source codes for "Video Visual Relation Detection via Iterative Inference" (MM'21) [paper].
- Download ImageNet-VidVRD dataset and VidOR dataset. Then, place the data under the same parent folder as this repository (recommended):
├── vidor-dataset
│ ├── annotation
│ │ ├── training
│ │ └── validation
│ └── video
├── imagenet-vidvrd-dataset
│ ├── test
│ ├── train
│ └── videos
├── VidVRD-II
│ ├── ... (this repo)
├── vidor-baseline-output
│ ├── ... (intermediate results)
└── imagenet-vidvrd-baseline-output
├── ... (intermediate results)
- Install dependencies (tested with TITAN Xp GPU)
conda create -n vidvrd-ii -c conda-forge python=3.7 Cython tqdm scipy "h5py>=2.9=mpi*" ffmpeg=3.4 cudatoolkit=10.1 cudnn "pytorch>=1.7.0=cuda101*" "tensorflow>=2.0.0=gpu*"
conda activate vidvrd-ii
python setup.py build_ext --inplace
- [Optional] Since
cv2
is incompatible with the main environment in vidvrd-ii, if you want to use the scriptvisualize.py
, you may need to create a separate environment withpy-opencv
installed.
- Download the precomputed object tracklets and features for ImageNet-VidVRD (437MB) and VidOR (32GB: part1, part2, part3, part4), and extract them under
imagenet-vidvrd-baseline-output
andvidor-baseline-output
as above, respectively. - Run
python main.py --cfg config/imagenet_vidvrd_3step_prop_wd0.01.json --id 3step_prop_wd0.01 --train --cuda
to train the model for ImageNet-VidVRD. Use--cfg config/vidor_3step_prop_wd1.json
for VidOR. - Run
python main.py --cfg config/imagenet_vidvrd_3step_prop_wd0.01.json --id 3step_prop_wd0.01 --detect --cuda
to detect video relations (inference) and the results will be output to../imagenet-vidvrd-baseline-output/models/3step_prop_wd0.01/video_relations.json
. - Run
python evaluate.py imagenet-vidvrd test relation ../imagenet-vidvrd-baseline-output/models/3step_prop_wd0.01/video_relations.json
to evaluate the results. - To visualize the results, add the option
--visualize
to the above command (this will involvevisualize.py
so please make sure the environment is switched according to the last section). For the better visualization mentioned in the paper, changeassociation_algorithm
tograph
in the configuration json, and then run Step 3 and 5. - To automatically run the whole traininng and test pipepine multiple times, run
python main.py --cfg config/imagenet_vidvrd_3step_prop_wd0.01.json --id 3step_prop_wd0.01 --pipeline 5 --cuda --no_cache
and then you can obtain a mean/std result.
- We extract frame-level object proposals using the off-the-shelf tool. Please first download and install tensorflow model library. Then, run
python -m video_object_detection.tfmodel_image_detection [imagenet-vidvrd/vidor] [train/test/training/validation]
. You can also download our precomputed results for ImageNet-VidVRD (6GB). - To obtain object tracklets based on the frame-level proposals, run
python -m video_object_detection.object_tracklet_proposal [imagenet-vidvrd/vidor] [train/test/training/validation]
.
This repository is built based on VidVRD-helper. If this repo is helpful in your research, you can use the following bibtex to cite the paper:
@inproceedings{shang2021video,
title={Video Visual Relation Detection via Iterative Inference},
author={Shang, Xindi and Li, Yicong and Xiao, Junbin and Ji, Wei and Chua, Tat-Seng},
booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
pages={3654--3663},
year={2021}
}