lucidrains/FLASH-pytorch
Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"
PythonMIT
Issues
- 0
I would like to ask if your model can be applied to other text classification tasks?
#14 opened by ZoeLct - 0
About the "/n"
#13 opened by kj01239876 - 2
- 0
The speed.
#11 opened by wangyuxin87 - 1
Speed on TPU
#6 opened by magicknight - 6
mask error
#1 opened by keyunluo - 1
Is it a typo in FLASH module?
#10 opened by marsggbo - 2
About the "shift_tokens"
#5 opened by kangzhao2 - 1
rel_pos_bias in GAU
#9 opened by SunderlandAJ-1130 - 1
- 0
- 2
Cross-Attention?
#4 opened by amorehead - 5
einsum operation in Linear Attention Part
#2 opened by ShomyLiu