Cogito2012/UString

What are all these green bounding boxes?

monjurulkarim opened this issue ยท 16 comments

In the result video what are all these green bounding boxes? Aren't these bounding boxes should only be on detected objects? I saw many boxes in empty space where there are no objects. Are these false detections?

@monjurulkarim They are top-K region proposals according to the detection scores. In this work, we sampled top-K bounding boxes as region proposals where K is set to 19 by default. It follows the same protocol as DSA-RNN (accv'16). If the bounding boxes are only on detected objects, the number of boxes will not be fixed, which is not practical for model learning.

@Cogito2012 , thank you for your reply. I understood what you did.
Could you please just give me a little more explanation what did you mean by, "If the bounding boxes are only on detected objects, the number of boxes will not be fixed."?
Did you mean that, each frame we need K number of fixed proposals?

@monjurulkarim Right, in my paper, I proposed to use GCN to learn the object relations for accident anticipation. If the number of bounding boxes is not fixed, the graph structure will be dynamically changing due to increasing/decreasing nodes. In this case, graph convolution is not applicable.

@Cogito2012 I got it. thank you.

@Cogito2012
In the code where can I find the following equation from your paper:

image

@monjurulkarim It's in line 304 at DataLoader.py file. As our graph is fully connected with this Eq., I just implemented it in data loading.

@Cogito2012
Can you please kindly explain what are these two numbers indicate?

image

@monjurulkarim The det entry of the feature file is the object detection results for the whole video. In the last dimension, it has 6 columns which are [x1, y1, x2, y2, score, class_label]. You may refer to demo.py at line 65 to see how they are obtained.

To comute the graph edge, we only need the bounding box coordinates (the first 4 columns), such that detections[i, :, :4] is used as input.

@Cogito2012
Thank you for the clarification!

@Cogito2012
I just checked inside the feature file. Why the class score is always zero in all frames?

@monjurulkarim It's just because the threshold of object detection output is small in order to get enough bounding boxes. But this class score is not used in the algorithm. You can just ignore it.

@Cogito2012 Thank you for the reply.
If the score of all the objects are zero then how did you select 19 objects in each frame?

@monjurulkarim You can refer to our bbox_sampling function in line 46 in demo.py. With the detected bounding boxes, we do not rely on the score but use a sampling strategy here. We randomly select the rest of boxes from top-N entries of the detected boxes, if the number of detected boxes are less than 19.

For your concern, if you find a frame in which all objects have zero scores, that means the detection performance is not good enough for that frame. Absolutely, you can try your best to train a good object detector for your dataset.

@Cogito2012
Hello, I hope you are doing well!
I came up with a question. My question is why did you select exactly 19 candidate objects? Why not any other number (eg.10 or 30)?

@monjurulkarim This number is simply following the prior work DSA-RNN , where 19 objects + 1 full-frame are used as the feature representation of each frame. I think there is no special reason to set this number but just an empirical value.