lucidrains/MEGABYTE-pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
PythonMIT
Issues
- 1
Allow usage as single-stage transformer
#20 opened by eegli - 3
Regression in padding value and loss calculation
#19 opened by eegli - 1
Evaluation metric bits-per-byte
#14 opened by jxiw - 4
the patch embbeder implementations are different from the original paper
#11 opened by mikegreen7892003 - 1
- 4
Some question about the MEGABYTE
#4 opened by relic-yuexi - 1
Why does it expect tokens?
#16 opened by tonydavis629 - 1
GPU used for original paper experiments
#15 opened by itsnamgyu - 1
Training Results and Scaling
#12 opened by MiscellaneousStuff - 1
No available kernel error
#6 opened by missflash - 1
Minor shape error
#10 opened by anruigu - 2
- 0
- 1
the string is still divided into pieces
#3 opened by wac81 - 4
What are the implications of this model?
#2 opened by kyegomez