lucidrains/performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

PythonMIT

Issues

Separate Transformer Encoder & Decoder modules with linear attention?
#96 opened 4 months ago by harshakmohan
0
Residual Connection
#77 opened 3 years ago by jiyounglee-0523
3
Modify the transformer tutorial based on performer
#95 opened 2 years ago by HelloWorldLTY
0
Cross-attention with arbitrary causal mask
#94 opened 2 years ago by BarKetPlace
0
Question about masking
#89 opened 2 years ago by Microbiods
2
Pretrained example
#93 opened 2 years ago by jubueche
0
Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count
#92 opened 2 years ago by michaelweihaosong
1
Causal linear attention benchmark
#64 opened 4 years ago by caffeinetoomuch
13
Using replicating nn.MultiHeadAttention with multiple performer SelfAttention modules
#91 opened 2 years ago by JGittles
0
I want to use Peroformer on MAE
#90 opened 2 years ago by Zhaoyi-Yan
0
Question: Is Performer order equivariant? (can it transform an unordered set of tensors)
#88 opened 2 years ago by nmakes
0
Relative Positional Encoding for Linear Attention Models.
#72 opened 3 years ago by Vbansal21
3
Using Performer with GNNs
#87 opened 3 years ago by jah377
0
Huge model state dict size?
#86 opened 3 years ago by Liyue1d
0
Attention map
#85 opened 3 years ago by merouone
2
Performer Plain
#84 opened 3 years ago by Rachel66666
0
How to test the performer architecture for training new models?
#83 opened 3 years ago by ayan-iiitd
1
Output inconsistent for autoregressive performer
#82 opened 3 years ago by GanjinZero
2
Rotary Position Embedding
#81 opened 3 years ago by ahmdtaha
0
torch_tensorrt compilation fails
#80 opened 3 years ago by FredHaa
0
way to make two elements invisible?
#79 opened 3 years ago by 1140310118
0
torch.max(data_dash) bug
#76 opened 3 years ago by martinpflaum
2
SelfAttention layer seems to have large error relative to nn.MultiheadAttention?
#46 opened 4 years ago by jueseph
8
FastAttention doesn't give results in agreement with standard attention?
#69 opened 4 years ago by simonaxelrod
7
Recover attention scores
#70 opened 4 years ago by carlomarxdk
3
hyperbolic cosine based estimator
#73 opened 3 years ago by gaganbahga
0
Names `to_k`, `to_q`, `to_v`, `to_out` cause issues
#71 opened 4 years ago by JamesDeAntonis
0
Input and Context size in CrossAttention
#68 opened 4 years ago by caffeinetoomuch
2
Performer Benchmark
#67 opened 4 years ago by CavallucciMartina
0
Replacing Attention module of Vision Transformer with SelfAttention Module of Performer?
#48 opened 4 years ago by PascalHbr
6
Performance gain replacing original attention to fast attention in this repo?
#52 opened 4 years ago by phypan11
2
Causal performer slower than causal regular attention
#66 opened 4 years ago by JamesDeAntonis
3
`to_out` bias
#65 opened 4 years ago by JamesDeAntonis
3
why is bias true in `to_<q,k,v>`?
#63 opened 4 years ago by JamesDeAntonis
4
Getting error with the check_redraw_projections when using DataParallel
#61 opened 4 years ago by Warvito
4
Saving checkpoints during training and loading
#57 opened 4 years ago by ylhsieh
3
No fp16 support from fast-transformers (CausalDotProduct)
#44 opened 4 years ago by gulnazaki
75
Decoder Mask
#62 opened 4 years ago by Muennighoff
0
Triangular matrices ?
#42 opened 4 years ago by jeremycochoy
10
Deterministic layers
#58 opened 4 years ago by anklebreaker
1
context-specific embeddings from language model?
#60 opened 4 years ago by rainwala
0
Extra FF when using cross attention
#56 opened 4 years ago by gulnazaki
8
Decoder randomly outputs NaN tensor.
#53 opened 4 years ago by y-rokutan
5
FixNorm alongside ScaleNorm
#55 opened 4 years ago by gulnazaki
3
[Feature] Adding fixed positional embeddings as an option
#47 opened 4 years ago by gulnazaki
3
Applying decoder input mask?
#51 opened 4 years ago by maxmax1992
2
Bug fix in original google-research implementation
#50 opened 4 years ago by gulnazaki
3
Plain Performer, if you are working with say images or other modalities
#49 opened 4 years ago by haoshuai714
1
Question: torch.max term used in `softmax_kernel`
#45 opened 4 years ago by ClawangTU
4
wrong implementation for autoregressive self-attention
#40 opened 4 years ago by Sleepychord
10