jinglescode/papers

Linformer: Self-Attention with Linear Complexity

jinglescode opened this issue · 0 comments

Paper

Link: https://arxiv.org/abs/2006.04768
Year: 2020

Summary

  • self-attention mechanism can be approximated by a low-rank matrix, reduces the overall self-attention complexity from O(n^2) to O(n) in both time and space.

Contributions and Distinctions from Previous Works

  • standard self-attention mechanism of the Transformer uses O(n^2) time and
    space with respect to sequence length

Methods

image

  • if we can estimate the attention weights, we can reduce the number needed

Results

  • for transformer, increasing sequence length will increase inference time, but linformer, stays constant, only increasing k will increase inference time.

image

  • results shows that, linformer can perform well on some tasks, and not on some other tasks. some linformer does better than some other linformer varient

image