lucidrains/performer-pytorch
An implementation of Performer, a linear attention-based transformer, in Pytorch
PythonMIT
Issues
- 0
- 3
Residual Connection
#77 opened by jiyounglee-0523 - 0
- 0
Cross-attention with arbitrary causal mask
#94 opened by BarKetPlace - 2
Question about masking
#89 opened by Microbiods - 0
Pretrained example
#93 opened by jubueche - 1
Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count
#92 opened by michaelweihaosong - 13
Causal linear attention benchmark
#64 opened by caffeinetoomuch - 0
Using replicating nn.MultiHeadAttention with multiple performer SelfAttention modules
#91 opened by JGittles - 0
I want to use Peroformer on MAE
#90 opened by Zhaoyi-Yan - 0
Question: Is Performer order equivariant? (can it transform an unordered set of tensors)
#88 opened by nmakes - 3
- 0
Using Performer with GNNs
#87 opened by jah377 - 0
Huge model state dict size?
#86 opened by Liyue1d - 2
Attention map
#85 opened by merouone - 0
Performer Plain
#84 opened by Rachel66666 - 1
- 2
- 0
Rotary Position Embedding
#81 opened by ahmdtaha - 0
torch_tensorrt compilation fails
#80 opened by FredHaa - 0
way to make two elements invisible?
#79 opened by 1140310118 - 2
torch.max(data_dash) bug
#76 opened by martinpflaum - 8
SelfAttention layer seems to have large error relative to nn.MultiheadAttention?
#46 opened by jueseph - 7
- 3
Recover attention scores
#70 opened by carlomarxdk - 0
hyperbolic cosine based estimator
#73 opened by gaganbahga - 0
- 2
Input and Context size in CrossAttention
#68 opened by caffeinetoomuch - 0
Performer Benchmark
#67 opened by CavallucciMartina - 6
Replacing Attention module of Vision Transformer with SelfAttention Module of Performer?
#48 opened by PascalHbr - 2
- 3
- 3
`to_out` bias
#65 opened by JamesDeAntonis - 4
why is bias true in `to_<q,k,v>`?
#63 opened by JamesDeAntonis - 4
- 3
Saving checkpoints during training and loading
#57 opened by ylhsieh - 75
- 0
Decoder Mask
#62 opened by Muennighoff - 10
Triangular matrices ?
#42 opened by jeremycochoy - 1
Deterministic layers
#58 opened by anklebreaker - 0
context-specific embeddings from language model?
#60 opened by rainwala - 8
Extra FF when using cross attention
#56 opened by gulnazaki - 5
Decoder randomly outputs NaN tensor.
#53 opened by y-rokutan - 3
FixNorm alongside ScaleNorm
#55 opened by gulnazaki - 3
- 2
Applying decoder input mask?
#51 opened by maxmax1992 - 3
- 1
- 4
- 10