franciszzj/TP-SIS

Regarding the implementation of self and cross-attention

xiaopengguo opened this issue · 2 comments

https://github.com/franciszzj/TP-SIS/blob/89ca8ba44680830a92b8cff8cb3f0ff1e175b661/model/layers.py#L232C9-L243C65

I'm curious about the insights behind adding positional embedding to the q and k, but not to the v in both self and cross-attention; is the positional embedding added in each attention block, and if so, why? Looking forward to further insights, and thank you in advance!

I'm sorry for the late reply. This is a good question.
Initially, we didn't give it much thought and simply followed the settings in CRIS's code. I believe there is a typo here.
After making modifications and experimenting, I found that the results with and without adding positional embedding to the “value” are similar. I hope this can be of reference to you.

Thank you for your reply!