ofirpress/attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
PythonMIT
Issues
- 29
- 1
ALiBi during inference
#20 opened by VarunGumma - 3
The numerical value of ALiBi attn_mask
#9 opened by chijames - 1
Imeplementation about ALibi
#19 opened by DreamShibei - 0
implementation detail about alibi_mask
#18 opened by bugm - 0
For a value of `12` I'm seeing a jump in the plotting of values, e.g. `0.7` shown below.
#17 opened by razodactyl - 1
Have you initialized the model with other model checkpoints during training?
#14 opened by Victoriaheiheihei - 1
- 2
Is there any easy way to get a HF compatible version of your checkpoints?
#12 opened by petroskarypis - 3
- 1
- 1
- 2
How to perform sliding window evaluation?
#10 opened by chijames - 2
ALiBi in Parallel Attention
#8 opened by conceptofmind - 4
- 1
Integration with `transformers`
#6 opened by sayakpaul - 1
Abili on LongformerEncoderDecoder
#3 opened by beaupranisaa - 2
ALiBi in self-Attention
#4 opened by Ldoun - 3
unrecognized argument --max-lr
#2 opened by cifkao