Learning Where to Focus for Efficient Video Object Detection

Installation

Clone this repository.

git clone https://github.com/jiangzhengkai/LSTS.git

Run sh ./init.sh. The scripts will build cython module automatically and create some folders.

Install MXNet:

3.1 Clone MXNet and checkout to MXNet@(commit 62ecb60) by

git clone --recursive https://github.com/dmlc/mxnet.git
git checkout 62ecb60
git submodule update

3.2 Copy operators in lib/ops/* to $(YOUR_MXNET_FOLDER)/src/operator/contrib by

cp -r lib/ops/* $(MXNET_ROOT)/src/operator/contrib/

3.3 Compile MXNet

cd ${MXNET_ROOT}
make -j4

3.4 Install the MXNet Python binding by

cd python
sudo python setup.py install

Preparation for Training & Testing

Please download ILSVRC2015 DET and ILSVRC2015 VID dataset, and make sure it looks like this:

./data/ILSVRC2015/
./data/ILSVRC2015/Annotations/DET
./data/ILSVRC2015/Annotations/VID
./data/ILSVRC2015/Data/DET
./data/ILSVRC2015/Data/VID
./data/ILSVRC2015/ImageSets

Please download ImageNet pre-trained ResNet-v1-101 model and Flying-Chairs pre-trained FlowNet model manually from OneDrive (for users from Mainland China, please try Baidu Yun), and put it under folder ./model. Make sure it looks like this:
```
./model/pretrained_model/resnet_v1_101-0000.params
./model/pretrained_model/flownet-0000.params
```

Usage

All of our experiment settings (GPU #, dataset, etc.) are kept in yaml config files at folder ./experiments/lsts/cfgs.

To perform experiments, run the python script with the corresponding config file as input.

python experiments/lsts/lsts_end2end_train_test.py --cfg experiments/lsts_rfcn/cfgs/lsts_network_uniform.yaml

Bibtex

@inproceedings{jiang2020learning,
  title={Learning Where to Focus for Efficient Video Object Detection},
  author={Jiang, Zhengkai and Liu, Yu and Yang, Ceyuan and Liu, Jihao and Gao, Peng and Zhang, Qian and Xiang, Shiming and Pan, Chunhong},
  booktitle={European Conference on Computer Vision},
  year={2020},
}