idiap/fast-transformers

Pytorch library for fast transformer implementations

Python

Issues

Cuda version
#117 opened 2 years ago by jiaji-huang
2
ERROR: Could not build wheels for pytorch-fast-transformers, which is required to install pyproject.toml-based projects
#128 opened a year ago by ouusan
12
TypeError: canonicalize_version() got an unexpected keyword argument 'strip_trailing_zero'
#132 opened 7 months ago by luispintoc
0
Full Attention does not sum to 1
#131 opened 8 months ago by yourj4m
1
Can't officially save Linear Attention model
#114 opened 3 years ago by maulberto3
2
Speed of linear attention slower than the attention implemented in pytorch
#130 opened 9 months ago by yzeng58
0
[WinError 2] The system cannot find the file specified: build_ext
#129 opened a year ago by cliffordkleinsr
1
Error about `causal_product_cpu.cpython-38-darwin.so` on Mac
#124 opened 2 years ago by XiaoqZhang
2
installation error
#92 opened 4 years ago by davidliujiafeng
6
ImportError
#127 opened a year ago by PaulaTeeuwen
1
causal-linear do not use attn_mask ?
#105 opened 3 years ago by davidliujiafeng
1
`.causal_product_cuda` missing in pip installed version on linux
#125 opened 2 years ago by Jackl-o-o-l
4
Provenance of algorithms
#126 opened 2 years ago by taibai123abc
0
Got different result for the same batch
#123 opened 2 years ago by gaoshan2006
1
Installing error on linux
#112 opened 3 years ago by xxmlala
4
Windows installation - building wheel error
#121 opened 2 years ago by BenoitDalFerro
27
Windows installation - Building wheel
#106 opened 2 years ago by MaximeHoude
3
TransformerDecoderBuilder: decoder only self attention example
#81 opened 4 years ago by patdflynn
3
Detailed implementation of `clustered_sparse_dot_product`
#120 opened 2 years ago by HanielF
0
pip install and c++ compilation error, then name 'compute_hashes_cuda' is not defined
#89 opened 4 years ago by nikjetchev
3
Understanding how to define key, query and value for the cross attention calculation
#119 opened 2 years ago by neuronphysics
0
Example for NLP
#118 opened 2 years ago by Bachstelze
0
Training Language Model
#107 opened 3 years ago by lucasnfe
2
Speed of recurrent model
#116 opened 2 years ago by mads-oestergaard
2
can offer built code for linus?
#115 opened 2 years ago by li-car-fei
0
Runtime error on causal_product_cpu on GCC/G++ 11
#110 opened 3 years ago by lsisoft
3
Any decoder example?
#113 opened 3 years ago by ahmedraza1996
1
Installation failed on Windows
#97 opened 3 years ago by WRKULOL
2
Parallel complexity of Linear Attention is O(N)?
#108 opened 3 years ago by haozheji
1
Casual attention is cheating by looking in the future
#111 opened 3 years ago by jogardi
1
how causal mask constructed in training batch model with linear causal attention?
#109 opened 3 years ago by Howuhh
0
Huggingface Bert vs. Fast Transformer full attention
#100 opened 3 years ago by lipmem
9
For recurrent models, are positional embeddings required?
#102 opened 4 years ago by rongcuid
5
Quick start raise a ModuleNotFoundError
#99 opened 4 years ago by CaoYiqingT
2
CUDA error: CUBLAS_STATUS_INVALID_VALUE
#104 opened 4 years ago by huu4ontocord
1
layernorm eps is not copied properly when cloning HF_Bert
#103 opened 4 years ago by huu4ontocord
2
Mask and QK not of the same shape ?
#101 opened 4 years ago by Baldwin-disso
1
local_dot_product_cuda fails when queries and keys have different lengths
#98 opened 4 years ago by tridao
0
Can't import causal_product_cuda
#96 opened 4 years ago by 15805383399
1
support of cluster attention
#95 opened 4 years ago by TianhaoFu
1
TypeError: forward() missing 3 required positional arguments: 'attn_mask', 'query_lengths', and 'key_lengths'
#94 opened 4 years ago by TianhaoFu
1
ModuleNotFoundError: No module named 'aggregate.aggregate_cpu'
#93 opened 4 years ago by TianhaoFu
2
Question over cuda implementation of causal product (forward)
#91 opened 4 years ago by thomasw21
1
Make fast-transformers JIT Compilable
#88 opened 4 years ago by AndriyMulyar
1
[FAVOR & friends] Orthogonal random matrix not uniformly drawn
#87 opened 4 years ago by blefaudeux
3
Implementing clustering function of clustered_attention with Python.
#85 opened 4 years ago by mHsuann
2
CUDA version and CausalDotProduct time
#83 opened 4 years ago by caffeinetoomuch
4
Tips and tricks for training linear_att
#84 opened 4 years ago by gaceladri
7
Where is the sum operation of KV?
#82 opened 4 years ago by Yogurt928
3
Queries scaling is not consistent for recurrent wrappers
#80 opened 4 years ago by hadaev8
3