lucidrains/MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

PythonMIT

Issues

Allow usage as single-stage transformer
#20 opened a month ago by eegli
1
Regression in padding value and loss calculation
#19 opened a month ago by eegli
3
Evaluation metric bits-per-byte
#14 opened a year ago by jxiw
1
the patch embbeder implementations are different from the original paper
#11 opened 5 months ago by mikegreen7892003
4
Why your Attention impl use kv dimention dim_head instead of inner_dim?
#13 opened 6 months ago by Earthson
1
Some question about the MEGABYTE
#4 opened 8 months ago by relic-yuexi
4
Why does it expect tokens?
#16 opened 9 months ago by tonydavis629
1
GPU used for original paper experiments
#15 opened 10 months ago by itsnamgyu
1
Training Results and Scaling
#12 opened a year ago by MiscellaneousStuff
1
No available kernel error
#6 opened a year ago by missflash
1
Minor shape error
#10 opened a year ago by anruigu
1
some implementations are different from the original paper
#7 opened a year ago by ZihaoH
2
translation of model sizes from paper to model definition
#5 opened a year ago by winglian
0
the string is still divided into pieces
#3 opened a year ago by wac81
1
What are the implications of this model?
#2 opened a year ago by kyegomez
4