FasterDecoding/SnapKV

Python

Issues

expanded_attn_mask = causal_4d_mask.masked_fill(expanded_attn_mask.bool(), torch.finfo(dtype).min) RuntimeError: The size of tensor a (3509) must match the size of tensor b (7017) at non-singleton dimension 3
#13 opened 6 days ago by seeyourcell
1
Can't not run longbench!
#12 opened 16 days ago by HarryWu99
0
why only decode do compress?
#11 opened 18 days ago by CSEEduanyu
0
Only kv is compressed. Is the size of Q and K inconsistent when attention is calculated?
#10 opened 25 days ago by CSEEduanyu
1
It seems that snapkv need to be able to do "prefill" at least once before the prompt can be compressed.
#9 opened a month ago by 66RING
1
Questions on paper and code [prompting for mistral, positional index, minor errors & questions in paper]
#1 opened a month ago by MarsJacobs
8
Grouped query attention implementation
#4 opened a month ago by guozhiyu
1
maybe a bug in `update_kv` function
#3 opened a month ago by HarryWu99
1
The effect of Clustering via Pooling may be greater？
#2 opened a month ago by HarryWu99
1