jinglescode/papers

Transformers are rnns: Fast autoregressive transformers with linear attention

jinglescode opened this issue · 0 comments

Paper

Link: http://proceedings.mlr.press/v119/katharopoulos20a.html
Year: 2020

Summary

  • reformulates the attention mechanism in terms of kernel functions and obtains a linear formulation, which reduces these requirements. Surprisingly, this formulation also surfaces an interesting connection between autoregressive transformers and RNNs

Contributions and Distinctions from Previous Works

  • from O(N^2) to O(N) both time and memory

Results

  • in terms of performance, outperform on some tasks but not on some
  • definitely faster than vanilla transformer and slightly faster than reformer

image