AnswerDotAI/cold-compress

Experiment with Fixed Global Tokens

griff4692 opened this issue · 1 comments

Efficient Streaming Language Models with Attention Sinks find that the first k=4 tokens have global importance – receive consistent attention regardless of semantic relevance to the particular prompt.

Explore this and other possible "global tokens" --> instructions, queries, etc.

Implemented and merged