tomaarsen/attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
PythonApache-2.0
Issues
- 0
TypeError: bad operand type for unary -: 'NoneType'
#47 opened by wln20 - 0
- 8
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
#37 opened by pseudotensor - 1
Trying to install via Kaggle
#44 opened by Kuchiriel - 0
- 5
GPTQ models support
#31 opened by synacktraa - 2
Support newer versions of mistral (e.g. mistralai/Mistral-7B-Instruct-v0.2)?
#41 opened by spring1915 - 0
chatglm3 support?
#40 opened by ScottishFold007 - 1
3.3: Learnable Sink Token
#38 opened by photomz - 1
Error when using Qwen 7b chat
#36 opened by Minami-su - 0
Error loading Qwen-1_8B
#35 opened by haiphong93 - 4
ValueError: Attention Sinks does not support Flash Attention in QWen models, please use `use_flash_attn=False` in `AutoModelForCausalLM.from_pretrained`.
#32 opened by Essence9999 - 0
- 1
- 1
Flash Attention Support
#30 opened by Jiayuanhip - 2
Questions Related to the Application and Results of Attention Sinks After the Paper
#28 opened by dsdanielpark - 10
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [31,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
#22 opened by pseudotensor - 2
Avoid overly strict "transformers==4.34.0",
#26 opened by pseudotensor - 16
Error when using Qwen-14B
#24 opened by sun1092469590 - 1
Error when importing
#12 opened by Caet-pip - 1
Bigcode architecture
#21 opened by selimsandal - 4
Issue with only adding sink tokens in cache
#17 opened by sam1373 - 3
The results of sink/transformer/windowed under outputs_*/ folders are all the same
#18 opened by ZiweiHe - 4
Experiments with MPT7b with seqlen > 2048
#14 opened by vchiley - 1
Strategy for trust_remote_code?
#19 opened by kmn1024 - 16
- 2
Add support for GPT-J models
#11 opened by versae - 0
Add cotributing.md
#9 opened by rajveer43 - 3
Error when using Falcon
#8 opened by helleuch - 2
Use with `pipeline` or `generate`
#7 opened by helleuch