Jamie-Stirling/RetNet
An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
PythonMIT
Issues
- 3
- 0
How to predict use this net?
#30 opened by GodPCWANG - 2
a question about xpos and D of decay mat
#35 opened by DavideHe - 5
Dimensions of forward_recurrent
#36 opened by Qiu30 - 0
- 0
- 2
NO LM HEAD
#32 opened by shnuhw - 0
The complex theta should cancel out
#28 opened by albertbuchard - 2
/src/retnet.py GPU
#27 opened by Qiu30 - 0
- 3
Passing Attention Masks
#24 opened by leffff - 1
Error when the model is running on GPU
#16 opened by SSamDav - 2
- 1
Q, k and D device difference
#22 opened by leffff - 0
can retnet be applied in point cloud tasks?
#19 opened by huiyang0613 - 2
Changelog of official implementation
#18 opened by donglixp - 4
- 0
what about cross-attention
#17 opened by aki819 - 4
- 0
- 1
RetNet Officially Released
#11 opened by tiendung - 6
Training is slow and some errors (perhaps)
#4 opened by Zth9730 - 1
- 0
Real-valued implementation using xPos
#5 opened by Jamie-Stirling - 4
About the complex
#1 opened by KohakuBlueleaf - 3
Some Questions about Attention Mask
#2 opened by tang-ed