AnswerDotAI/cold-compress

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

PythonBSD-3-Clause

Issues

How to get attention scores
#44 opened a month ago by wiluen
1
Question of evaluation
#43 opened 2 months ago by freeSoul-SNU
0
SnapKV
#42 opened 3 months ago by SimJeg
2
Does the repo support quantization methods? Does the repo support kv merge methods?
#40 opened 3 months ago by foreverpiano
5
torch dependency results in error
#41 opened 3 months ago by maxjeblick
1
It seems the compression doesn't work and compression ratio is always =0
#39 opened 3 months ago by foreverpiano
5
Implement ThinK
#36 opened 3 months ago by griff4692
0
Implement PyramidInfer
#35 opened 3 months ago by griff4692
0
Implement InfLLM
#30 opened 4 months ago by griff4692
0
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
#29 opened 4 months ago by griff4692
0
LLama3 GIST
#2 opened 4 months ago by griff4692
0
Compute Heavy Hitters KV-Cache Eviction Policy
#3 opened 4 months ago by griff4692
0
Profile Llama3 Attention Heads
#4 opened 4 months ago by griff4692
0
Implement Evaluations (Decide on datasets and benchmark initial methods)
#8 opened 4 months ago by griff4692
0
Record Model Speed in evals
#6 opened 5 months ago by griff4692
0
Record memory consumption
#7 opened 5 months ago by griff4692
0
Benchmark prompt summarizers with frontier LLMs (GPT-4o / Opus)
#1 opened 5 months ago by griff4692
1
Experiment with Fixed Global Tokens
#5 opened 5 months ago by griff4692
1