lucidrains/x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
PythonMIT
Issues
- 8
Support for flash attention 2
#186 opened - 21
- 4
Sampling questions/(issues?)
#184 opened - 2
Masks in cross attention
#183 opened - 1
Questions on rotary position embedding
#182 opened - 2
Idea: hyperparameter searching
#181 opened - 0
Torchsummary not working
#180 opened - 3
- 0
How to use this package correctly?
#178 opened - 9
return hidden states of all layers
#177 opened - 15
Do you consider adding retnet
#176 opened - 1
Unused Dropout Parameter
#175 opened - 4
Dimension mismatch with cross attention
#174 opened - 2
- 4
Question: normalizing mask shape
#172 opened - 41
NTK-aware Scaled RoPE
#171 opened - 9
- 1
- 1
ContinuousTransformerWrapper returning list of tensors as opposed to stack of tensors in 1.16.20
#167 opened - 16
- 2
- 7
Attention mask, is True True?
#164 opened - 6
RuntimeError: output with shape [1, 1, 16, 16] doesn't match the broadcast shape [1, 8, 16, 16]
#163 opened - 0
GPT training problem
#162 opened - 7
Incorrect boolean mask in flash attention
#160 opened - 21
- 2
LoRA fine-tuning?
#158 opened - 1
- 1
Small syntax error in LayerIntermediates
#156 opened - 5
- 0
- 1
AlibiPositionalBias: slicing buffered bias
#152 opened - 2
Question about Over-Smoothing problem?
#151 opened - 2
Dimension mismatch in attention (v1.16.1+)
#150 opened - 3
Question about ViTransformerWrapper
#146 opened - 6
- 1
Flash is not flash
#144 opened - 12
- 4
Enhanced recurrence question
#142 opened - 0
Feature Request: Hyena Attention
#140 opened - 0
Feature Request: Hyena Attention
#139 opened - 0
Feature Request: Hyena Attention
#138 opened - 6
Feature Request: Hyena Attention
#137 opened - 6
- 5
- 4
- 2
Feature request: generate top k sequences
#133 opened - 0
Implementing a small ViT-VQGAN
#132 opened - 1
typo?
#131 opened - 1
BERT Training and Word-Level Tokenization
#129 opened