Moe-VRD: A Python repository from shibshib

Moe-VRD

This is the source code for the Moe-VRD Project as maintained by the VIP lab at the University of Waterloo. This code creates a mixture of experts framework and encapsulates the work done by Shang et al.'s VidVRD-II to a singular expert, which can then be used as an expert in the proposed mixture of experts for video relationship detection.

Note This work is in progress, and as this project is relatively new on Github code-wise there will be lots of changes.

Environment

The setup is very similar to Shang et al.'s code setup:

Download ImageNet-VidVRD dataset and VidOR dataset. Then, place the data under the same parent folder as this repository.
Install dependencies (tested with TITAN Xp GPU, Nvidia RTX A6000)

conda create -n moe-vrd -c conda-forge python=3.7 Cython tqdm scipy "h5py>=2.9=mpi*" ffmpeg=3.4 cudatoolkit=10.1 cudnn "pytorch>=1.7.0=cuda101*" "tensorflow>=2.0.0=gpu*"
conda activate moe-vrd
python setup.py build_ext --inplace

Quick Start

Download the precomputed object tracklets and features for ImageNet-VidVRD (437MB) and VidOR (32GB: part1, part2, part3, part4), and extract them under imagenet-vidvrd-baseline-output and vidor-baseline-output as above, respectively.
Run python main.py --cfg config/imagenet_vidvrd_3step_prop_wd0.01.json --id 3step_prop_wd0.01 --train --cuda to train the model for ImageNet-VidVRD. Use --cfg config/vidor_3step_prop_wd1.json for VidOR.
Run python main.py --cfg config/imagenet_vidvrd_3step_prop_wd0.01.json --id 3step_prop_wd0.01 --detect --cuda to detect video relations (inference) and the results will be output to ../imagenet-vidvrd-baseline-output/models/3step_prop_wd0.01/video_relations.json.
Run python evaluate.py imagenet-vidvrd test relation ../imagenet-vidvrd-baseline-output/models/3step_prop_wd0.01/video_relations.json to evaluate the results.
To visualize the results, add the option --visualize to the above command (this will involve visualize.py so please make sure the environment is switched according to the last section). For the better visualization mentioned in the paper, change association_algorithm to graph in the configuration json, and then run Step 3 and 5.
To automatically run the whole traininng and test pipepine multiple times, run python main.py --cfg config/imagenet_vidvrd_3step_prop_wd0.01.json --id 3step_prop_wd0.01 --pipeline 5 --cuda --no_cache and then you can obtain a mean/std result.

Object Tracklet Extraction (optional)

We extract frame-level object proposals using the off-the-shelf tool. Please first download and install tensorflow model library. Then, run python -m video_object_detection.tfmodel_image_detection [imagenet-vidvrd/vidor] [train/test/training/validation]. You can also download our precomputed results for ImageNet-VidVRD (6GB).
To obtain object tracklets based on the frame-level proposals, run python -m video_object_detection.object_tracklet_proposal [imagenet-vidvrd/vidor] [train/test/training/validation].

Acknowledgement

This repository is built based on VidVRD-helper and VidVRD-II. If this repo is helpful in your research, you can use the following bibtex to both their paper and our repository:

@misc{sha2021moe,
    title={Video Relationship Detection using Mixture of Experts},
    author={Shaabana, Ala, Fieguth, Paul, Luo, Chong, Lan, Cuiling},
    journal={https://github.com/shibshib/moe-vrd.git},
    year={2021}
}

@inproceedings{shang2021video,
    author={Shang, Xindi and Li, Yicong and Xiao, Junbin and Ji, Wei and Chua, Tat-Seng},
    title={Video Visual Relation Detection via Iterative Inference},
    booktitle={ACM International Conference on Multimedia},
    year={2021}
}

shibshib/Moe-VRD

Moe-VRD

Environment

Quick Start

Object Tracklet Extraction (optional)

Acknowledgement