sgl-project/sglang

[Help wanted] Does RadixAttention have anything to do with attention?

Closed this issue · 1 comments

Great work! I've been learning SGLang recenly, but I'm a bit confused about RadixAttention, I think the purpose of RadixAttention is to reuse KV Cache better, that's all! So does RadixAttention have anything to do with the computation of attention? If not, wouldn't it be better to call it RadixPrefixCache?

@Wanglongzhi2001 The name RadixAttention is not confusing in the syntax that KV cache is one of the most important components to calculate attention, and attention is the most important part of modern transformer architecture. That is, if we reduce the usage of the KV cache(by reusing prefixes), then we can speed up the whole process.

To get more details of RadixAttention, you can read the paper: https://arxiv.org/pdf/2312.07104