Pinned Repositories
RAG
attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining