ubc-vision/COTR

question about positional encoding

Closed this issue · 1 comments

Hi. According to formula (4) in your paper, you add positional encoding P to get a context feature map c. But in your code, you just follow transformer to add positional encoding to key and query and keep value clean? Did I miss anything?

Hi. Yes, the positional encoded feature map is fed to the transformer encoder as the query and key, while the original feature map is served as the values.
We mainly followed the design of DETR for the backbone and transformer.