IAAR-Shanghai/Awesome-Attention-Heads

Add new paper: In-Context Language Learning: Architectures and Algorithms

Closed this issue · 1 comments

Title: In-Context Language Learning: Architectures and Algorithms
Head: n-gram head(Induction head)
Published: ICML
Summary:

  • Innovation: Provided evidence that their ability to do ICLL tasks relies on specialized “n-gram heads” that compute input-conditional next-token distributions.
  • Tasks: Using three complementary strategies—attention visualization, probing hidden representations, and black-box input–output analysis—to analyze the behavior of Transformers trained for ICLL tasks.
  • Result: Demonstrated that inserting simple induction heads for n-gram modeling into neural architectures significantly improves their ICLL performance as well as their language modeling abilities in natural data.

Already included