Regarding the implementation of self and cross-attention

Question

Regarding the implementation of self and cross-attention

xiaopengguo opened this issue a year ago · 2 comments

https://github.com/franciszzj/TP-SIS/blob/89ca8ba44680830a92b8cff8cb3f0ff1e175b661/model/layers.py#L232C9-L243C65

I'm curious about the insights behind adding positional embedding to the q and k, but not to the v in both self and cross-attention; is the positional embedding added in each attention block, and if so, why? Looking forward to further insights, and thank you in advance!

Answer 1 · 2023-12-19T06:52:38.000Z

I'm sorry for the late reply. This is a good question.
Initially, we didn't give it much thought and simply followed the settings in CRIS's code. I believe there is a typo here.
After making modifications and experimenting, I found that the results with and without adding positional embedding to the “value” are similar. I hope this can be of reference to you.

Answer 2 · 2023-12-19T07:30:05.000Z

Thank you for your reply!