jinglescode/papers

Linformer: Self-Attention with Linear Complexity

jinglescode opened this issue 5 years ago · 0 comments

jinglescode commented 5 years ago

Paper

Link: https://arxiv.org/abs/2006.04768
Year: 2020

Summary

self-attention mechanism can be approximated by a low-rank matrix, reduces the overall self-attention complexity from O(n^2) to O(n) in both time and space.

Contributions and Distinctions from Previous Works

standard self-attention mechanism of the Transformer uses O(n^2) time and
space with respect to sequence length

Methods

if we can estimate the attention weights, we can reduce the number needed

Results

for transformer, increasing sequence length will increase inference time, but linformer, stays constant, only increasing k will increase inference time.

results shows that, linformer can perform well on some tasks, and not on some other tasks. some linformer does better than some other linformer varient