Add new paper: In-Context Language Learning: Architectures and Algorithms

Question

Closed this issue 15 days ago · 1 comments

Title: In-Context Language Learning: Architectures and Algorithms
Head: n-gram head(Induction head)
Published: ICML
Summary:

Innovation: Provided evidence that their ability to do ICLL tasks relies on specialized “n-gram heads” that compute input-conditional next-token distributions.
Tasks: Using three complementary strategies—attention visualization, probing hidden representations, and black-box input–output analysis—to analyze the behavior of Transformers trained for ICLL tasks.
Result: Demonstrated that inserting simple induction heads for n-gram modeling into neural architectures significantly improves their ICLL performance as well as their language modeling abilities in natural data.

Answer 1 · 2024-12-18T20:57:32.000Z

Already included