ofirpress/attention_with_linear_biases

Code for the ALiBi method for transformer language models (ICLR 2022)

PythonMIT

Issues

Modifying ALiBi for Encoder-Attention or Cross-Attention
#5 opened 3 years ago by ofirpress
29
ALiBi during inference
#20 opened 6 months ago by VarunGumma
1
The numerical value of ALiBi attn_mask
#9 opened 2 years ago by chijames
3
Imeplementation about ALibi
#19 opened a year ago by DreamShibei
1
implementation detail about alibi_mask
#18 opened a year ago by bugm
0
For a value of `12` I'm seeing a jump in the plotting of values, e.g. `0.7` shown below.
#17 opened a year ago by razodactyl
0
Have you initialized the model with other model checkpoints during training?
#14 opened a year ago by Victoriaheiheihei
1
What is the extrapolation method used in the paper?
#16 opened a year ago by XinyuDu
1
Is there any easy way to get a HF compatible version of your checkpoints?
#12 opened 2 years ago by petroskarypis
2
How can I apply ALiBi Position Encoding into huggingface model?
#11 opened 2 years ago by hjsg1010
3
could we apply Alibi with rotary position embedding?
#13 opened 2 years ago by xiaoxiawu-microsoft
1
can you tell me how to use alibi while fine-tuning LLAMA model?
#15 opened 2 years ago by kiran1501
1
How to perform sliding window evaluation?
#10 opened 2 years ago by chijames
2
ALiBi in Parallel Attention
#8 opened 3 years ago by conceptofmind
2
Explanation regarding multiplying linear biases with q.k^T
#7 opened 3 years ago by sayakpaul
4
Integration with `transformers`
#6 opened 3 years ago by sayakpaul
1
Abili on LongformerEncoderDecoder
#3 opened 3 years ago by beaupranisaa
1
ALiBi in self-Attention
#4 opened 3 years ago by Ldoun
2
unrecognized argument --max-lr
#2 opened 3 years ago by cifkao
3