lucidrains/x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
PythonMIT
Issues
- 5
Correct interaction between CLS token and RoPE
#249 opened by oasidorshin - 5
Question: problem with xval implementation
#248 opened by HarshaSatyavardhan - 17
[Bug] XL-recurrence with AlibiPositionalBias and mems not working correctly
#242 opened by pfeatherstone - 1
Confusion about image->caption example
#239 opened by mtran14 - 3
- 1
Was it a clerical error ? ScaleNorm.g init form dim ** -0.5. I think it should be dim ** 0.5
#246 opened by junphine - 7
[Question] very small attention scores
#245 opened by pfeatherstone - 3
- 1
- 3
- 0
Generation for PaLI?
#238 opened by BurgerAndreas - 6
- 1
how to set inputs to the right shape
#218 opened by emadkavousi - 5
RotaryEmbedding XPOS doesn't work with mems
#233 opened by pfeatherstone - 11
- 0
- 9
How to build optimizer
#230 opened by pfeatherstone - 9
- 8
- 3
Adding memmask to ContinuousTransformerWrapper
#227 opened by pfeatherstone - 34
- 5
Removed biases breaks pre-trained models
#225 opened by zqevans - 3
Seq len missing in rotary embedding
#226 opened by raganato - 1
"Stabilizing Transformer Training by Preventing Attention Entropy Collapse" improvement to ViT
#217 opened by catid - 1
Enhancement: Multi Input/Output transformers
#222 opened by RyanKim17920 - 3
- 5
kv cache breaks generation
#219 opened by ad8e - 3
- 16
Support for NormSoftmax
#215 opened by catid - 2
Bert token type embedding
#213 opened by eyalmazuz - 14
ONNX export failed
#212 opened by pfeatherstone - 2
- 7
Masking for prepend_embeds
#211 opened by zqevans - 1
Question: masking in token shifting
#208 opened by pfeatherstone - 1
- 1
Do you consider adding rwkv
#207 opened by chaodreaming - 5
Question: return_mems and rotary_pos_emb
#205 opened by pfeatherstone - 1
- 1
pre_norm_has_final_norm kwarg not used
#202 opened by sashakunitsyn - 1
Transformer Goat
#201 opened by darkman111a - 1
Question: attn_head_scale with use_scalenorm
#200 opened by pfeatherstone - 0
- 1
Lack of the deep_norm variants of transformer
#198 opened by ZegangC - 7
- 1
ContinuousTransformer num_memory_tokens bug
#194 opened by pfeatherstone - 7
[Feature request] support num_memory_tokens in ContinuousTransformerWrapper
#192 opened by pfeatherstone - 4
Issue with torch.compile
#189 opened by scopello - 2
Issue with qk scale between flash attention and normal attention when qk_norm is True
#190 opened by mistycube - 1
Feature request: add local and reformer
#188 opened by samvanstroud - 8
Bugs in generation with cache and seq_start_pos
#187 opened by LouChao98