Jamie-Stirling/RetNet

An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"

PythonMIT

Issues

Is Retnet equivalent to ordinary GPT when the decay is set to 1 ?
#37 opened a year ago by xuanyaoming
3
How to predict use this net?
#30 opened a year ago by GodPCWANG
0
a question about xpos and D of decay mat
#35 opened a year ago by DavideHe
2
Dimensions of forward_recurrent
#36 opened a year ago by Qiu30
5
Confusion about "the chunkwise recurrent representation of retention"
#34 opened a year ago by CHENHUI-X
0
Can this mechanism be applied to PointCloud data ?
#33 opened a year ago by madjid-dx
0
NO LM HEAD
#32 opened a year ago by shnuhw
2
The complex theta should cancel out
#28 opened a year ago by albertbuchard
0
/src/retnet.py GPU
#27 opened a year ago by Qiu30
2
Assistance on training a new retention network model ?
#25 opened a year ago by risedangel
0
Passing Attention Masks
#24 opened a year ago by leffff
3
Error when the model is running on GPU
#16 opened a year ago by SSamDav
1
Proposed improvement/collaboration: removing the O(T^2) training cost
#21 opened a year ago by jackd
2
Q, k and D device difference
#22 opened a year ago by leffff
1
can retnet be applied in point cloud tasks？
#19 opened a year ago by huiyang0613
0
Changelog of official implementation
#18 opened a year ago by donglixp
2
Chunkwise retention giving different output
#10 opened a year ago by Jamie-Stirling
4
what about cross-attention
#17 opened a year ago by aki819
0
demo example / number of parameter control vs original code
#14 opened a year ago by thegodone
4
Can you make this repo in available for package installers (pip)?
#12 opened a year ago by gaasher
0
RetNet Officially Released
#11 opened a year ago by tiendung
1
Training is slow and some errors (perhaps)
#4 opened a year ago by Zth9730
6
_get_D function very slow for long sequence
#7 opened a year ago by ZuowenWang0000
1
Real-valued implementation using xPos
#5 opened a year ago by Jamie-Stirling
0
About the complex
#1 opened a year ago by KohakuBlueleaf
4
Some Questions about Attention Mask
#2 opened 2 years ago by tang-ed
3