lucidrains/x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers

PythonMIT

Issues

Support for flash attention 2
#186 opened 8 months ago
8
autoregressive wrapper generate should accommodate variable lengthed prefixes
#185 opened 8 months ago
21
Sampling questions/(issues?)
#184 opened 8 months ago
4
Masks in cross attention
#183 opened 9 months ago
2
Questions on rotary position embedding
#182 opened 9 months ago
1
Idea: hyperparameter searching
#181 opened 9 months ago
2
Torchsummary not working
#180 opened 9 months ago
0
TransformerWrapper wants inputs x of type long
#179 opened 9 months ago
3
How to use this package correctly?
#178 opened 9 months ago
0
return hidden states of all layers
#177 opened 9 months ago
9
Do you consider adding retnet
#176 opened 9 months ago
15
Unused Dropout Parameter
#175 opened 10 months ago
1
Dimension mismatch with cross attention
#174 opened 10 months ago
4
Feature request: don't specify attn_flash. Select when possible
#173 opened 10 months ago
2
Question: normalizing mask shape
#172 opened 10 months ago
4
NTK-aware Scaled RoPE
#171 opened 9 months ago
41
Is there a plan to handle the inference slowness？ eg. KV Cache
#170 opened 9 months ago
9
Back-propagation on Mask for attention layers
#168 opened 10 months ago
1
ContinuousTransformerWrapper returning list of tensors as opposed to stack of tensors in 1.16.20
#167 opened 10 months ago
1
Feature request: support return_mems in ContinuousTransformerWrapper
#166 opened a year ago
16
[Bug] attn_sparse_topk : NameError: name 'dots' is not defined
#165 opened a year ago
2
Attention mask, is True True?
#164 opened a year ago
7
RuntimeError: output with shape [1, 1, 16, 16] doesn't match the broadcast shape [1, 8, 16, 16]
#163 opened a year ago
6
GPT training problem
#162 opened a year ago
0
Incorrect boolean mask in flash attention
#160 opened a year ago
7
How to use the rope scaling with x_transformers ?
#159 opened a year ago
21
LoRA fine-tuning?
#158 opened a year ago
2
Any plans to make a jax iteration of this repository? We really need it
#157 opened a year ago
1
Small syntax error in LayerIntermediates
#156 opened a year ago
1
Any plans to implement the new flash sparsed attention?
#155 opened a year ago
5
ALiBi: buffered bias slicing gets confusing when `i != j`
#153 opened a year ago
0
AlibiPositionalBias: slicing buffered bias
#152 opened a year ago
1
Question about Over-Smoothing problem?
#151 opened 5 days ago
2
Dimension mismatch in attention (v1.16.1+)
#150 opened a year ago
2
Question about ViTransformerWrapper
#146 opened a year ago
3
Question: clarification of ResiDual implementation
#145 opened a year ago
6
Flash is not flash
#144 opened a year ago
1
RuntimeError: No available kernel. Aborting execution.
#143 opened a year ago
12
Enhanced recurrence question
#142 opened a year ago
4
Feature Request: Hyena Attention
#140 opened a year ago
0
Feature Request: Hyena Attention
#139 opened a year ago
0
Feature Request: Hyena Attention
#138 opened a year ago
0
Feature Request: Hyena Attention
#137 opened a year ago
6
Feature request: use scaled_dot_product_attention()
#136 opened a year ago
6
Suggestion for OOD extrapolation power of Transformers
#135 opened a year ago
5
Cross attention between different shape tensors
#134 opened a year ago
4
Feature request: generate top k sequences
#133 opened a year ago
2
Implementing a small ViT-VQGAN
#132 opened a year ago
0
typo?
#131 opened a year ago
1
BERT Training and Word-Level Tokenization
#129 opened a year ago
1