Replacing Attention module of Vision Transformer with SelfAttention Module of Performer?

Question

Replacing Attention module of Vision Transformer with SelfAttention Module of Performer?

PascalHbr opened this issue 4 years ago · 6 comments

Hey, thanks for your great work I love it! :) A quick question - in your repo for the Vision Transformer [https://github.com/lucidrains/vit-pytorch] there is a module called Attention. Can I simply use the Vision Transformer and replace the Attention module with the SelfAttention module from the Performer?

Answer 1 · 2020-12-18T19:16:28.000Z

@PascalHbr hey Pascal! indeed you can! there's actually research groups already investigating this type of attention (linear attention) with vision tasks https://github.com/lucidrains/lambda-networks https://github.com/lucidrains/global-self-attention-network I wouldn't try Performer on vision tasks just yet

Answer 2 · 2020-12-18T21:05:23.000Z

Hey lucidrains, I'm also interested in applying the Performer to vision. Can I ask why you wouldn't try it just yet?

Answer 3 · 2020-12-18T21:28:31.000Z

@NZ42 actually, I missed the section on ImageNet in the paper. ok, I take it back, maybe it is worth trying!

Answer 4 · 2020-12-22T11:57:36.000Z

Thank you for the quick reply. In all honesty I'm interested in substituting the self-attention of vision transformers with FAVOR. I see that in your other repo you use the Linformer. Do you have any tips about how to best approach this? I'm also looking into substituting it in pretrained models from timm.

Answer 5 · 2020-12-23T17:13:29.000Z

@NZ42 You just need to plug the Performer instance into the efficient wrapper https://github.com/lucidrains/vit-pytorch#efficient-attention

Answer 6 · 2021-05-01T01:52:50.000Z

@lucidrains I recently used your implementation of performer (https://github.com/microsoft/vision-longformer/blob/main/src/models/layers/performer.py) of linformer (https://github.com/microsoft/vision-longformer/blob/main/src/models/layers/linformer.py) to compare different efficient attention mechanisms in image classification and object detection tasks. See the results reported here: https://github.com/microsoft/vision-longformer. Thank you for your excellent open-sourced code!

@PascalHbr @NZ42 You may be interested in the results, too.