lucidrains/x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers

PythonMIT

Issues

[Feature request] Efficient stochastic path without unused layers
#294 opened 6 days ago by Aceticia
4
Conda Forge Package
#293 opened 5 days ago by iamthebot
4
Support for a mask during autoregressive generation with Key-Value Caching
#292 opened 11 days ago by Oufattole
10
Varying weights to losses depending on the length of the sequence
#291 opened 13 days ago by shuishida
12
[Minor; noob question] Uniform distribution instead of normal
#232 opened 10 months ago by p0p4k
1
Pad value not being used in `align_right` function
#290 opened 13 days ago by shuishida
0
[Feature request] Alibi for arbitrary positions
#280 opened 18 days ago by Aceticia
18
Pytorch 2.5 CuDNN backend for SDPA
#289 opened 19 days ago by maxall41
3
qk_norm and kv_heads conflict
#282 opened a month ago by alexdemartos
12
[new paper] TokenFormer
#287 opened 21 days ago by guillaumeguy
2
Tokenformer
#286 opened 22 days ago by iiLaurens
2
forgetting transformers
#281 opened 25 days ago by faresobeid
38
paper for GLU Mult Bias?
#275 opened 2 months ago by TimS-ml
3
Is ContinuousTransformerWrapper needed?
#284 opened a month ago by mpeven
2
Selective Attention
#278 opened a month ago by kazunator
12
Can't pickle <class 'torch.nn.attention._SDPBackend'>: attribute lookup _SDPBackend on torch.nn.attention failed
#272 opened 2 months ago by guillaumeguy
3
return_only_embed
#277 opened a month ago by JLenzy
11
Small paper ideas to be added
#262 opened 4 months ago by RyanKim17920
6
Differential Transformer
#276 opened a month ago by lamm-mit
1
Adding a latent vector at each layer
#274 opened 2 months ago by JLenzy
6
encoder.to_logits.weight doesn't update
#273 opened 2 months ago by guillaumeguy
3
Potential bug in `model.generate`
#271 opened 2 months ago by guillaumeguy
5
[Question] Embedding Inputs to Transformer [batch, seq_len, embedding_dim]
#270 opened 2 months ago by francotheengineer
2
Pytorch warning with autocast
#265 opened 3 months ago by pfeatherstone
1
[Question] Why is RotaryEmbedding not used when cross attending?
#258 opened 4 months ago by pfeatherstone
1
[Feature Request] CoPE from Meta
#268 opened 3 months ago by JLenzy
3
Allow passing in a pre-existing TokenEmbedding into TransformerWrapper
#266 opened 3 months ago by Waino
0
Decoder logits are NaN when cross-attention context is all padding
#263 opened 4 months ago by giorgioskij
2
How to use "src_key_padding_mask"
#253 opened 6 months ago by LutherLin
3
Feature request: Multi-Head Latent Attention Support
#259 opened 4 months ago by nanowell
1
Should the mask option to AttentionLayers(boolean) increase memory?
#261 opened 4 months ago by blasscoc
1
Question: problem with xval implementation
#248 opened 7 months ago by HarshaSatyavardhan
6
RotaryEmbedding XPOS doesn't work with mems
#233 opened 5 months ago by pfeatherstone
5
Enable flash attention does not support BFloat16?
#254 opened 5 months ago by Kaimary
1
Is this the same "X-transformer" that being used in "X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism" paper?
#257 opened 5 months ago by argadewanata
1
Random lack of gradients
#256 opened 5 months ago by Baran-phys
1
Problem with cache and memory
#255 opened 5 months ago by Baran-phys
0
Question: rotary embeddings and bad length extrapolation
#241 opened 6 months ago by pfeatherstone
1
Sinusoidal embedding order choice different from original definition
#252 opened 6 months ago by gordicaleksa
1
[Question] very small attention scores
#245 opened 6 months ago by pfeatherstone
7
RoPE inconsistency (2-dim subspaces choice)
#250 opened 6 months ago by gordicaleksa
0
Correct interaction between CLS token and RoPE
#249 opened 7 months ago by oasidorshin
5
[Bug] XL-recurrence with AlibiPositionalBias and mems not working correctly
#242 opened 7 months ago by pfeatherstone
17
Confusion about image->caption example
#239 opened 9 months ago by mtran14
1
[Bug] Error when `rotary_pos_emb` set to True in cross attention
#247 opened 7 months ago by BakerBunker
3
Was it a clerical error ? ScaleNorm.g init form dim ** -0.5. I think it should be dim ** 0.5
#246 opened 7 months ago by junphine
1
How can I add custom attention masks to a Decoder?
#240 opened 9 months ago by DerEchteFeuerpfeil
3
Generation for PaLI?
#238 opened 10 months ago by BurgerAndreas
0
`layer_mem` is unbound (when called from `ContinuousTransformerWrapper`)
#237 opened 10 months ago by amitkparekh
6
How to build optimizer
#230 opened 10 months ago by pfeatherstone
9