taokong/RON

Some questions about paper

DuinoDu opened this issue · 8 comments

Hey, much thanks for your great work. About the paper, I have some questions if you don't mind.

  1. For each scale feature maps, there is a seperated classifier and regressor to get class-specific score and bounding box regression. So for four scales, there are four classifiers and regressors. This might bring repeated computation. I wonder if these operations on different scales can merge in some way.
  2. I find that objectness prior is much like rpn(region proposal network). The only difference is that objectness prior only produces a score without bbreg, which is included in rpn. I wonder if I am wrong. Please give me some tips about the differences.
  3. For the last classifier and regressor, one uses two convs while the other uses two inceptions. I wonder the reason why you choose them.
    Thanks again. If disturbed, please forgive.

@DuinoDu for the second comment, I agree with you. The objectness prior is very close to the rpn in faster rcnn. I think the main contribution of this paper is the reverse connection, which combine different scaled feature map to detect objects in different size.

@DuinoDu @mattdingmeng

The reverse connection is similar with the idea of the paper(the Feature Pyramid Networks for Object Detection and Deconvoluiton SSD).

@DuinoDu
For the first question, we find that not sharing features could get better detection results. Maybe you can have try about sharing weights with four scales.
The objectness prior is modified from RPN. The original RPN will do bbox regression to get better localization, however, the anchor's location will be changed after bbox reg. So Faster R-CNN must use ROI-Pooling to extract features on these changed anchors. Thus the detection module will bring repeated computations.
@mattdingmeng @chengshuai
Yes, the idea of reverse connection is similar with DSSD, FPN and TDM. In fact, the four works are developed amost at the same period. RON and FPN are both accepted by cvpr2017.

Thanks!

I want to know why it is faster than Faster R-CNN.
Who can help me ,thanks a lot

@kl456123 the author mentioned it that use ROI-Pooling can bring extra computation. Meanwhile, I think discarding Fully Connection Layer also can accelerate the speed of train and inference.

I want to study about the small target detection in large scale scene.
But I find that, the CNN feature Map is very important, If the CNN base model can't find the target, the regression has no meaning.
Could you give me some tips about how to advance the CNN feature model ?

twmht commented

@taokong

by the way, I have sent an email to you to ask some questions about hypernet (https://arxiv.org/abs/1604.00600). Please take a look if you have time:)