/houghnet-vid

[TPAMI-22] Bottom-up, voting based video object detection method

Primary LanguagePythonMIT LicenseMIT

HoughNet-VID: Spatial and temporal voting for video object detection

Official PyTroch implementation of HoughNet for video object detection. More details could be found in the paper:

HoughNet: Integrating near and long-range evidence for visual detection,
Nermin Samet, Samet Hicsonmez, Emre Akbas,
TPAMI, 2022. (arXiv pre-print)

Summary

The original HoughNet applies voting only in the spatial domain - for object detection in still images. We extended this idea to the temporal domain by developing a new method, which takes the difference of features from two frames, and applies spatial and temporal voting using our “temporal voting module” to detect objects. We showed the effectiveness of our method on ILSVRC2015 dataset.

Video Object Detection Results on ILSVRC2015

Method mAP mAP_Fast mAP_Medium mAP_Slow
HoughNet (single frame baseline) 68.8 45.8 66.1 79.1
HoughNet-VID 73.9 50.4 71.5 82.8

Temporal voting model can be donwloaded here.

Installation

Please refer to installation instructions of HoughNet.

Dataset Preparation

Please download ILSVRC2015 DET and ILSVRC2015 VID datasets from here. Next, please place the data as the following. Alternatively you could also create symlink.

./data/ILSVRC2015/
./data/ILSVRC2015/Annotations/DET
./data/ILSVRC2015/Annotations/VID
./data/ILSVRC2015/Data/DET
./data/ILSVRC2015/Data/VID
./data/ILSVRC2015/ImageSets

We used train and test images from MEGA. You could download the same splits, as ImageSets here.

Evaluation and Training

For evaluation and training please refer to here.

Acknowledgement

This work was supported the Scientific and Technological Research Council of Turkey (TUBITAK) through the project titled "Object Detection in Videos with Deep Neural Networks" (grant number 117E054). The numerical calculations reported in this paper were partially performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources).

License

HoughNet-VID is released under the MIT License (refer to the LICENSE file for details).

Citation

If you find HoughNet-VID useful for your research, please cite our paper as follows.

N. Samet, S. Hicsonmez, E. Akbas, "HoughNet: Integrating near and long-range evidence for visual detection", arXiv, 2021.

BibTeX entry:

@misc{HoughNet2021,
      title={HoughNet: Integrating near and long-range evidence for visual detection}, 
      author={Nermin Samet and Samet Hicsonmez and Emre Akbas},
      year={2021}, 
}