Some questions about reference point and object query?
rOtking opened this issue · 6 comments
Hi, Thank you for your good work!
Recently i am studying DETR-like, and i have questions as following:
1.there is no implement about reference point iterative refinement , but many other works do that and it seems work well. I want to know have you tested that? And i notice that your work (Anchor-DETR) which had been mentioned in DAB is implemented by refinement.
2.i am confusing about the word "object query", it seems different meanings in diiferent works. I want to know, in your opinoin, what the "object query" refers to? The content part(tgt in code), or positional part, or tgt + positional part.
3.Which part do you think is more important? content part or positional part
thank you again.
Hi, we have not tried to refine the points. In our paper, Q=Qf+Qp, in which Q (content part + positional part) is the 'query', Qf (tgt/content part) is the 'query feature', Qp (positional part) is the 'query position' and the Qp in the decoder is regarded as the 'object query'. The content part is more important, as the task of the decoder is to output a better content part. But the Qp is also important, it can help get a better content part.
@tangjiuqi097 Thank you for your reply! It's meaningful to me.
@tangjiuqi097 Regarding the first question, one line is inserted after this line for refinement
reference_points = output_coord[..., :2].clone().detach()
But the performance is slightly lower (~1% AP) than the original AnchorDETR. I think it is kind of strange. How about your experiment results? And do you have any suggestions on this observation?
@zen-d Hi, we have not explored reference point iterative refinement. But I guess it cannot simply insert that line, you may refer to the implementation of the code with reference refinement. For example, in DAB-DETR (as well as other methods), the final outputs_coord
is not the same as the iterative updated reference
.
@tangjiuqi097 Thanks for your rapid reply. Although a simple line is changed, it is after my second thought. As far as I understand, the output_coord[.., :2]
of the previous decoder layer is exactly the refined reference anchor point for the next decoder layer (after inverse sigmoid, residual addition, and sigmoid). Could you explain more about why you think this is not right? Please correct me if I'm wrong.
In addition, DAB-DETR complicates its implementation since it inherits Conditional-DETR using conditional quries and further considers WH dimensions for modulation, so it looks quite different.
@zen-d The implementation of the reference iterative refinement is inherited from the Deformable DETR, and in these methods (e.g., Deformable DETR\DAB DETR\DINO), the outputs_coord
is not exactly the same as the reference
. You could refer to their code for more detail if you are interested.