jinglescode/papers

Transformers are rnns: Fast autoregressive transformers with linear attention

jinglescode opened this issue 5 years ago · 0 comments

jinglescode commented 5 years ago

Paper

Link: http://proceedings.mlr.press/v119/katharopoulos20a.html
Year: 2020

Summary

reformulates the attention mechanism in terms of kernel functions and obtains a linear formulation, which reduces these requirements. Surprisingly, this formulation also surfaces an interesting connection between autoregressive transformers and RNNs

Contributions and Distinctions from Previous Works

from O(N^2) to O(N) both time and memory

Results

in terms of performance, outperform on some tasks but not on some
definitely faster than vanilla transformer and slightly faster than reformer