lucidrains/local-attention

An implementation of local windowed attention for language modeling

PythonMIT

Issues

'SinusoidalEmbeddings' object has no attribute 'apply_rotary_pos_emb'
#22 opened 4 months ago by MarcusLoppe
6
May I be allowed to delete einops and replace it with the operations provided by torch?
#20 opened 4 months ago by wencan
1
Which is exactly the attention pattern?
#11 opened 2 years ago by beleen23
3
sequence length 2 must be divisible by window size 8 for local attention
#21 opened 6 months ago by JonasLi-19
0
LocalTransformer Encoder Layer
#19 opened 10 months ago by AmitMY
0
The look_around function seems to be incorrect
#18 opened a year ago by datvuthanh
1
About the performance
#17 opened a year ago by ThyrixYang
2
Attention weight
#16 opened a year ago by emanuele-mincato
0
Wrong shape for attention bias vs sim tensor
#15 opened 2 years ago by inspirit
1
xPos Rotary Embeddings
#14 opened 2 years ago by ilya16
5
Transformer implementation with local attention
#10 opened 2 years ago by serkansulun
1
A bug of torch.arange for long sequence with fp16 data type
#12 opened 2 years ago by renll
1
More control over attention masking
#9 opened 3 years ago by Mindful
1
Maybe the function `shift` can be simpler and clearer.
#6 opened 4 years ago by lartpang
1
Bug in exact_window_size masking for Causal Attention?
#5 opened 4 years ago by xravitejax
3
Local Attention vs Standard when length < window_size
#4 opened 4 years ago by gautierdag
2
Please add option for exact window_size masking
#3 opened 4 years ago by usamec
2
question about the look around operation
#2 opened 4 years ago by benywon
2
question about the local attention
#1 opened 4 years ago by benywon
0