ubc-vision/COTR

Question

Closed this issue · 4 comments

What does the dense correspondence map in Figure 1 mean and how to get it and how to reflect it numerically, I only know that it is the dense correspondence between the two images, what does color-coded ‘x’ channel mean ?

Hi, thanks for your interests in our work!
The correspondence map C means that the pixel of left image Left[i, j] is matched to the right image at Right[C[i, j]]
It can be represented at pixel values or normalized values.
Because correspondence map C has 2 channels, and we only visualized 1 channel use a matplotlib color mapping.

Can I understand it this way? The value of C is the pixel value of the matching point in the right image? Or is it a relational matrix?

what means correspondence map C has 2 channels?

I still have questions to ask you, which layer of the network is C obtained, transformer or MLP, and what are the specific dimensions? Is it a 256-dimensional vector? What do the two channels mentioned in the reply refer to? Does the query pionts input into the network have 256 dimensions and only have position coordinates of 0 and 1? Or is it a 2-dimensional integer coordinate value?