Where to read EL-Attention source code for huggingface-transformers
ADaBenxiong opened this issue · 4 comments
We are very interested in your work and thank you for your work. We have read your paper"EL-Attention". The more comprehensive examples can be found here for huggingface-transformers, but the self-attention save the key and value, not only hidden_states. El-Attention proves that saving hidden_states can half of the memory.
Please see the implementation for fairseq in https://github.com/microsoft/fastseq/blob/main/fastseq/optimizer/fairseq/el_attention_optimizer.py
Hello, thanks for the implementation of your source code, and I read your code for fairseq (https://github.com/microsoft/fastseq/blob/main/fastseq/optimizer/fairseq/el_attention_optimizer.py).
I find that EL-Attention is implemented on cross attention, but self attention has not much changed. I don't know if I understand it correctly. After reading the paper, I read that GPT2 which does not have cross-attention can speed up too.
Thanks a lot.
The code I pasted here is EL-Attention for cross attention, similar change could be applied for self attention.
After reading your code, I have understood the specific operation, thank you for your work and your careful explain