Shilin-LU/TF-ICON

A question

Closed this issue · 3 comments

Thank you for your great work!But I am a little confused about formula 5 in the paper.Why add Muser to Mseg?I think Muser is larger than Mseg, so what's the point of this addition operation, why not just use Muser?

Thank you. It is the XOR operation rather than addition.

Thank you.And in the code, why only replace cross-attention's forward function rather than self-attention in register_attention_control?The paper mentioned the self-attention.

no, in our code, both self-attention and cross-attention are composed and injected.