Is it equivalent to a specific form of "attention"?
xychenunc opened this issue · 3 comments
Thanks for your interesting idea!
I have not looked into the entire code yet, but from the code for Involution, it is kind of like attention (connect one pixel with only the K*K neighboring pixels). How did you use Involution in a specific framework? I mean what did you use it to replace for in, for example, ResNet? Or did you just build the Involution/"attention" on the existing framework without removing any existing things (It is highly likely not since you claimed your network uses less parameters)?
Thanks!
involution is more general than self-attention, see section 4.2 in our paper for detailed discussion.
I can easily understand your motivation and the way you implement involution. However, I do not think I can easily understand your analysis in the paper. Could you please use a few sentence to summarize how to apply involution to existing framework so that the potential can be effectively utilized (e.g., where do place it, what it can replace, etc)? Thank you!
Currently, we use it to replace 3x3 convolution in the network.