/Sequence-Level-Semantics-Aggregation

Sequence Level Semantics Aggregation for Video Object Detection

Primary LanguagePythonApache License 2.0Apache-2.0

Sequence Level Semantics Aggregation for Video Object Detection

Introduction

This is an official MXNet implementation of Sequence Level Semantics Aggregation for Video Object Detection. (ICCV 2019, oral). SELSA aggregates full-sequence level information of videos while keeping a simple and clean pipeline. It achieves 82.69 mAP with ResNet-101 on ImageNet VID validation set.

Citation

If you use the code or models in your research, please cite with:

@article{wu2019selsa,
  title={Sequence Level Semantics Aggregation for Video Object Detection},
  author={Wu, Haiping and Chen, Yuntao and Wang, Naiyan and Zhang, Zhaoxiang},
  journal={ICCV 2019},
  year={2019}
}

Main Results

training data testing data mAP(%) mAP(%)
(slow)
mAP(%)
(medium)
mAP(%)
(fast)
Single-frame baseline
(Faster R-CNN, ResNet-101)
ImageNet DET train
+ VID train
ImageNet VID validation 73.6 82.1 71.0 52.5
SELSA
(Faster R-CNN, ResNet-101)
ImageNet DET train
+ VID train
ImageNet VID validation 80.3 86.9 78.9 61.4
SELSA
(Faster R-CNN, ResNet-101, Data Aug)
ImageNet DET train
+ VID train
ImageNet VID validation 82.7 88.0 81.4 67.1

Installation

Please note that this repo is based on Python 2.

  1. Clone the repository.
git clone https://github.com/happywu/Sequence-Level-Semantics-Aggregation
  1. Install MXNet following https://mxnet.incubator.apache.org/get_started. We tested our code on MXNet v1.3.0.

  2. Install packages via

pip install -r requirements.txt
sh init.sh

Preparation for Training & Testing

  1. Please download ILSVRC2015 DET and ILSVRC2015 VID dataset, and make sure it looks like this:

    ./data/ILSVRC2015/
    ./data/ILSVRC2015/Annotations/DET
    ./data/ILSVRC2015/Annotations/VID
    ./data/ILSVRC2015/Data/DET
    ./data/ILSVRC2015/Data/VID
    ./data/ILSVRC2015/ImageSets
    
  2. Please download ImageNet pre-trained ResNet-v1-101 model and our pretrained SELSA ResNet-101 model manually, and put it under folder ./model. Make sure it looks like this:

    ./model/pretrained_model/resnet_v1_101-0000.params
    ./model/pretrained_model/selsa_rcnn_vid-0000.params
    

Testing

  1. To test the provided pretrained model, run the following command.
    python experiments/selsa/test.py --cfg experiments/selsa/cfgs/resnet_v1_101_rcnn_selsa_aug.yaml --test-pretrained ./model/pretrained_model/selsa_rcnn_vid
    

You should get the results as reported before.

Training

  1. To train, use the following command

    python experiments/selsa/train_end2end.py --cfg experiments/selsa/cfgs/resnet_v1_101_rcnn_selsa_aug.yaml
    

    A cache folder would be created automatically to save the model and the log under output/selsa_rcnn/imagenet_vid/.

  2. To test your trained model

    python experiments/selsa/test.py --cfg experiments/selsa/cfgs/resnet_v1_101_rcnn_selsa_aug.yaml
    

Other implementations

Pytorch: MMTracking

Acknowledge

This repo is modified from Flow-Guided-Feature-Aggregation.