tomaarsen/attention_sinks

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining

PythonApache-2.0

Issues

TypeError: bad operand type for unary -: 'NoneType'
#47 opened 4 months ago by wln20
0
Last generated token getting ignored in streaming.py?
#45 opened 8 months ago by ritik99
0
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
#37 opened a year ago by pseudotensor
8
Trying to install via Kaggle
#44 opened 10 months ago by Kuchiriel
1
TypeError: 'NoneType' object is not subscriptable
#43 opened 10 months ago by Kuchiriel
0
GPTQ models support
#31 opened a year ago by synacktraa
5
Support newer versions of mistral (e.g. mistralai/Mistral-7B-Instruct-v0.2)?
#41 opened a year ago by spring1915
2
chatglm3 support?
#40 opened a year ago by ScottishFold007
0
3.3: Learnable Sink Token
#38 opened a year ago by photomz
1
Error when using Qwen 7b chat
#36 opened a year ago by Minami-su
1
Error loading Qwen-1_8B
#35 opened a year ago by haiphong93
0
ValueError: Attention Sinks does not support Flash Attention in QWen models, please use `use_flash_attn=False` in `AutoModelForCausalLM.from_pretrained`.
#32 opened a year ago by Essence9999
4
Generation stop；torch.cuda.OutOfMemoryError: CUDA out of memory.
#34 opened a year ago by Essence9999
0
Add benchmarks comparing against Sliding Window Attention
#10 opened a year ago by casper-hansen
1
Flash Attention Support
#30 opened a year ago by Jiayuanhip
1
Questions Related to the Application and Results of Attention Sinks After the Paper
#28 opened a year ago by dsdanielpark
2
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [31,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
#22 opened a year ago by pseudotensor
10
Avoid overly strict "transformers==4.34.0",
#26 opened a year ago by pseudotensor
2
Error when using Qwen-14B
#24 opened a year ago by sun1092469590
16
Error when importing
#12 opened a year ago by Caet-pip
1
Bigcode architecture
#21 opened a year ago by selimsandal
1
Issue with only adding sink tokens in cache
#17 opened a year ago by sam1373
4
The results of sink/transformer/windowed under outputs_*/ folders are all the same
#18 opened a year ago by ZiweiHe
3
Experiments with MPT7b with seqlen > 2048
#14 opened a year ago by vchiley
4
Strategy for trust_remote_code?
#19 opened a year ago by kmn1024
1
Trying a minimal example with LlamaForCasualLM, sadly it fails
#1 opened a year ago by alexbalandi
16
Add support for GPT-J models
#11 opened a year ago by versae
2
Add cotributing.md
#9 opened a year ago by rajveer43
0
Error when using Falcon
#8 opened a year ago by helleuch
3
Use with `pipeline` or `generate`
#7 opened a year ago by helleuch
2