Moe-VRD
This is the source code for the Moe-VRD Project as maintained by the VIP lab at the University of Waterloo. This code creates a mixture of experts framework and encapsulates the work done by Shang et al.'s VidVRD-II to a singular expert, which can then be used as an expert in the proposed mixture of experts for video relationship detection.
Note This work is in progress, and as this project is relatively new on Github code-wise there will be lots of changes.
Environment
The setup is very similar to Shang et al.'s code setup:
-
Download ImageNet-VidVRD dataset and VidOR dataset. Then, place the data under the same parent folder as this repository.
-
Install dependencies (tested with TITAN Xp GPU, Nvidia RTX A6000)
conda create -n moe-vrd -c conda-forge python=3.7 Cython tqdm scipy "h5py>=2.9=mpi*" ffmpeg=3.4 cudatoolkit=10.1 cudnn "pytorch>=1.7.0=cuda101*" "tensorflow>=2.0.0=gpu*"
conda activate moe-vrd
python setup.py build_ext --inplace
Quick Start
- Download the precomputed object tracklets and features for ImageNet-VidVRD (437MB) and VidOR (32GB: part1, part2, part3, part4), and extract them under
imagenet-vidvrd-baseline-output
andvidor-baseline-output
as above, respectively. - Run
python main.py --cfg config/imagenet_vidvrd_3step_prop_wd0.01.json --id 3step_prop_wd0.01 --train --cuda
to train the model for ImageNet-VidVRD. Use--cfg config/vidor_3step_prop_wd1.json
for VidOR. - Run
python main.py --cfg config/imagenet_vidvrd_3step_prop_wd0.01.json --id 3step_prop_wd0.01 --detect --cuda
to detect video relations (inference) and the results will be output to../imagenet-vidvrd-baseline-output/models/3step_prop_wd0.01/video_relations.json
. - Run
python evaluate.py imagenet-vidvrd test relation ../imagenet-vidvrd-baseline-output/models/3step_prop_wd0.01/video_relations.json
to evaluate the results. - To visualize the results, add the option
--visualize
to the above command (this will involvevisualize.py
so please make sure the environment is switched according to the last section). For the better visualization mentioned in the paper, changeassociation_algorithm
tograph
in the configuration json, and then run Step 3 and 5. - To automatically run the whole traininng and test pipepine multiple times, run
python main.py --cfg config/imagenet_vidvrd_3step_prop_wd0.01.json --id 3step_prop_wd0.01 --pipeline 5 --cuda --no_cache
and then you can obtain a mean/std result.
Object Tracklet Extraction (optional)
- We extract frame-level object proposals using the off-the-shelf tool. Please first download and install tensorflow model library. Then, run
python -m video_object_detection.tfmodel_image_detection [imagenet-vidvrd/vidor] [train/test/training/validation]
. You can also download our precomputed results for ImageNet-VidVRD (6GB). - To obtain object tracklets based on the frame-level proposals, run
python -m video_object_detection.object_tracklet_proposal [imagenet-vidvrd/vidor] [train/test/training/validation]
.
Acknowledgement
This repository is built based on VidVRD-helper and VidVRD-II. If this repo is helpful in your research, you can use the following bibtex to both their paper and our repository:
@misc{sha2021moe,
title={Video Relationship Detection using Mixture of Experts},
author={Shaabana, Ala, Fieguth, Paul, Luo, Chong, Lan, Cuiling},
journal={https://github.com/shibshib/moe-vrd.git},
year={2021}
}
@inproceedings{shang2021video,
author={Shang, Xindi and Li, Yicong and Xiao, Junbin and Ji, Wei and Chua, Tat-Seng},
title={Video Visual Relation Detection via Iterative Inference},
booktitle={ACM International Conference on Multimedia},
year={2021}
}