Question about epipolar_fusion algorithm and code.
Opened this issue · 0 comments
NanCheng2001 commented
Generally speaking, aren't the keys and values in Transformer derived from image features? As the key you are using comes from src_feature
. But why does the value here come from nn. pos_embed (embedding)
? This seems to contradict common patterns. And it's also described in your paper like this, which makes me a bit puzzled.