zhenpeiyang/MVS2D

Question about epipolar_fusion algorithm and code.

Opened this issue · 0 comments

Generally speaking, aren't the keys and values in Transformer derived from image features? As the key you are using comes from src_feature. But why does the value here come from nn. pos_embed (embedding)? This seems to contradict common patterns. And it's also described in your paper like this, which makes me a bit puzzled.
14b57550a8d0dd96fb1b278774749cca