lucidrains/x-transformers
A concise but complete full-attention transformer with a set of promising experimental features from various papers
PythonMIT
Issues
- 4
- 4
Conda Forge Package
#293 opened by iamthebot - 10
- 12
- 1
- 0
Pad value not being used in `align_right` function
#290 opened by shuishida - 18
[Feature request] Alibi for arbitrary positions
#280 opened by Aceticia - 3
Pytorch 2.5 CuDNN backend for SDPA
#289 opened by maxall41 - 12
qk_norm and kv_heads conflict
#282 opened by alexdemartos - 2
[new paper] TokenFormer
#287 opened by guillaumeguy - 2
Tokenformer
#286 opened by iiLaurens - 38
forgetting transformers
#281 opened by faresobeid - 3
paper for GLU Mult Bias?
#275 opened by TimS-ml - 2
Is ContinuousTransformerWrapper needed?
#284 opened by mpeven - 12
Selective Attention
#278 opened by kazunator - 3
Can't pickle <class 'torch.nn.attention._SDPBackend'>: attribute lookup _SDPBackend on torch.nn.attention failed
#272 opened by guillaumeguy - 11
return_only_embed
#277 opened by JLenzy - 6
Small paper ideas to be added
#262 opened by RyanKim17920 - 1
Differential Transformer
#276 opened by lamm-mit - 6
Adding a latent vector at each layer
#274 opened by JLenzy - 3
encoder.to_logits.weight doesn't update
#273 opened by guillaumeguy - 5
Potential bug in `model.generate`
#271 opened by guillaumeguy - 2
[Question] Embedding Inputs to Transformer [batch, seq_len, embedding_dim]
#270 opened by francotheengineer - 1
Pytorch warning with autocast
#265 opened by pfeatherstone - 1
- 3
[Feature Request] CoPE from Meta
#268 opened by JLenzy - 0
- 2
- 3
How to use "src_key_padding_mask"
#253 opened by LutherLin - 1
Feature request: Multi-Head Latent Attention Support
#259 opened by nanowell - 1
- 6
Question: problem with xval implementation
#248 opened by HarshaSatyavardhan - 5
RotaryEmbedding XPOS doesn't work with mems
#233 opened by pfeatherstone - 1
Enable flash attention does not support BFloat16?
#254 opened by Kaimary - 1
Is this the same "X-transformer" that being used in "X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism" paper?
#257 opened by argadewanata - 1
Random lack of gradients
#256 opened by Baran-phys - 0
Problem with cache and memory
#255 opened by Baran-phys - 1
- 1
- 7
[Question] very small attention scores
#245 opened by pfeatherstone - 0
RoPE inconsistency (2-dim subspaces choice)
#250 opened by gordicaleksa - 5
Correct interaction between CLS token and RoPE
#249 opened by oasidorshin - 17
[Bug] XL-recurrence with AlibiPositionalBias and mems not working correctly
#242 opened by pfeatherstone - 1
Confusion about image->caption example
#239 opened by mtran14 - 3
- 1
Was it a clerical error ? ScaleNorm.g init form dim ** -0.5. I think it should be dim ** 0.5
#246 opened by junphine - 3
- 0
Generation for PaLI?
#238 opened by BurgerAndreas - 6
- 9
How to build optimizer
#230 opened by pfeatherstone