IAAR-Shanghai/Awesome-Attention-Heads

Add new paper: Transformers on Markov Data: Constant Depth Suffices

Closed this issue · 0 comments

Title: Transformers on Markov Data: Constant Depth Suffices
Head: Induction Head
Published: ICML
Summary:

  • Innovation: Proved that a transformer with a single head and three layers can represent the in-context conditional empirical distribution for kth-order Markov sources.
  • Tasks: Analyzed the performance of low-depth Transformers trained on kth-order Markov sources.
  • Significant Result: Prove a conditional lower bound showing that attention-only transformers need Ω(log(k)) layers to represent kth-order induction heads, under an assumption on the realized attention patterns.

.