microsoft/fastseq

Where to read EL-Attention source code for huggingface-transformers

ADaBenxiong opened this issue · 4 comments

We are very interested in your work and thank you for your work. We have read your paper"EL-Attention". The more comprehensive examples can be found here for huggingface-transformers, but the self-attention save the key and value, not only hidden_states. El-Attention proves that saving hidden_states can half of the memory.

Hello, thanks for the implementation of your source code, and I read your code for fairseq (https://github.com/microsoft/fastseq/blob/main/fastseq/optimizer/fairseq/el_attention_optimizer.py).
I find that EL-Attention is implemented on cross attention, but self attention has not much changed. I don't know if I understand it correctly. After reading the paper, I read that GPT2 which does not have cross-attention can speed up too.
Thanks a lot.

The code I pasted here is EL-Attention for cross attention, similar change could be applied for self attention.

After reading your code, I have understood the specific operation, thank you for your work and your careful explain