Experiment with Fixed Global Tokens
griff4692 opened this issue · 1 comments
griff4692 commented
Efficient Streaming Language Models with Attention Sinks find that the first k=4 tokens have global importance – receive consistent attention regardless of semantic relevance to the particular prompt.
Explore this and other possible "global tokens" --> instructions, queries, etc.
griff4692 commented
Implemented and merged