Add new paper: Transformers on Markov Data: Constant Depth Suffices
Closed this issue · 0 comments
wyzh0912 commented
Title: Transformers on Markov Data: Constant Depth Suffices
Head: Induction Head
Published: ICML
Summary:
- Innovation: Proved that a transformer with a single head and three layers can represent the in-context conditional empirical distribution for kth-order Markov sources.
- Tasks: Analyzed the performance of low-depth Transformers trained on kth-order Markov sources.
- Significant Result: Prove a conditional lower bound showing that attention-only transformers need Ω(log(k)) layers to represent kth-order induction heads, under an assumption on the realized attention patterns.
.