Experiment with Fixed Global Tokens

Question

griff4692 opened this issue 4 months ago · 1 comments

Efficient Streaming Language Models with Attention Sinks find that the first k=4 tokens have global importance – receive consistent attention regardless of semantic relevance to the particular prompt.

Explore this and other possible "global tokens" --> instructions, queries, etc.

Answer 1 · 2024-06-04T11:19:38.000Z

Implemented and merged