MEGA for Video Object Detection

By Yihong Chen, Yue Cao, Han Hu, Liwei Wang.

This repo is an official implementation of "Memory Enhanced Global-Local Aggregation for Video Object Detection", accepted by CVPR 2020. This repository contains a PyTorch implementation of our approach MEGA based on maskrcnn_benchmark, as well as some training scripts to reproduce the results on ImageNet VID reported in our paper.

Besides, this repository also implements several other algorithms like FGFA and RDN. Any new methods are welcomed. Hoping for your pull request! We hope this repository would help further research in the field of video object detection and beyond. :)

Citing MEGA

Please cite our paper in your publications if it helps your research:

@inproceedings{chen20mega,
    Author = {Chen, Yihong and Cao, Yue and Hu, Han and Wang, Liwei},
    Title = {Memory Enhanced Global-Local Aggregation for Video Object Detection},
    Conference = {CVPR},
    Year = {2020}
}

Updates

Results of ResNet-50 backbone added. (13/04/2020)
Code and pretrained weights for Deep Feature Flow released. (30/03/2020)

Main Results

Pretrained models are now available at Baidu (code: neck) and Google Drive.

Model	Backbone	AP50	Link
single frame baseline	ResNet-101	76.7	Google
DFF	ResNet-101	75.0	Google
FGFA	ResNet-101	78.0	Google
RDN-base	ResNet-101	81.1	Google
RDN	ResNet-101	81.7	Google
MEGA	ResNet-101	82.9	Google

Model	Backbone	AP50	Link
single frame baseline	ResNet-50	71.8	Google
DFF	ResNet-50	70.4	Google
FGFA	ResNet-50	74.3	Google
RDN-base	ResNet-50	76.7	Google
MEGA	ResNet-50	77.3	Google

Note: The performance of ResNet-50 backbone are not so stable.

Installation

Please follow INSTALL.md for installation instructions.

Data preparation

Please download ILSVRC2015 DET and ILSVRC2015 VID dataset from here. After that, we recommend to symlink the path to the datasets to datasets/. And the path structure should be as follows:

./datasets/ILSVRC2015/
./datasets/ILSVRC2015/Annotations/DET
./datasets/ILSVRC2015/Annotations/VID
./datasets/ILSVRC2015/Data/DET
./datasets/ILSVRC2015/Data/VID
./datasets/ILSVRC2015/ImageSets

Note: We have already provided a list of all images we use to train and test our model as txt files under directory datasets/ILSVRC2015/ImageSets. You do not need to change them.

Usage

Note: Cache files will be created at the first time you run this project, this may take some time! Don't worry!

Note: Currently, one GPU could only hold 1 image. Do not put 2 or more images on 1 GPU!

Note We provide template files named BASE_RCNN_{}gpus.yaml which would automatically change the batch size and other relevant settings. This behavior is similar to detectron2. If you want to train model with different number of gpus, please change it by yourself :) But assure 1 GPU only holds 1 image! That is to say, you should always keep SOLVER.IMS_PER_BATCH and TEST.IMS_PER_BATCH equal to the number of GPUs you use.

Inference

The inference command line for testing on the validation dataset:

python -m torch.distributed.launch \
    --nproc_per_node 4 \
    tools/test_net.py \
    --config-file configs/MEGA/vid_R_101_C4_MEGA_1x.yaml \
    MODEL.WEIGHT MEGA_R_101.pth

Please note that:

If your model's name is different, please replace MEGA_R_101.pth with your own.
If you want to evaluate a different model, please change --config-file to its config file and MODEL.WEIGHT to its weights file.
Testing is time-consuming, so be patient!

Training

The following command line will train MEGA_R_101_FPN_1x on 4 GPUs with Synchronous Stochastic Gradient Descent (SGD):

python -m torch.distributed.launch \
    --nproc_per_node=4 \
    tools/train_net.py \
    --master_port=$((RANDOM + 10000)) \
    --config-file configs/MEGA/vid_R_101_C4_MEGA_1x.yaml \
    OUTPUT_DIR training_dir/MEGA_R_101_1x