lucidrains/linear-attention-transformer
Transformer based on a variant of attention that is linear complexity in respect to sequence length
PythonMIT
Issues
- 1
- 0
Image linear attention reference
#19 opened by pravn - 0
Why dim != dim_head * heads?
#18 opened by zzczzc20 - 0
How to perform training?
#17 opened by pangshengwei - 2
Is the causal attention really works here?
#16 opened by charlesxu90 - 1
Scaling factors
#14 opened by radandreicristian - 0
- 1
- 0
ImageLinearAttention showcase
#11 opened by monajalal - 0
Challenge in replacing SelfAttention with ImageLinearAttention in Vision Transformer
#13 opened by monajalal - 0
- 0
- 1
- 3
Loss returns Nan
#6 opened by terencenwz - 1
Where does this constant come from?
#7 opened by aluo-x - 2
causal = True
#5 opened by wajihullahbaig - 1
- 2
Positional encoding?
#3 opened by matthew-jurewicz - 40
[Question] Merging with Trans-XL?
#2 opened by gaceladri - 4
seq2seq decoder ids
#1 opened