Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count
michaelweihaosong opened this issue · 1 comments
Hi,
First of all, this is a great package from lucidrains and I find it very helpful in my research.
A quick question is that I noticed ViT-performer is slower than the regular ViT from lucidrains. For example running on mnist from pytorch will take 15 sec/epoch for regular ViT with the configuration below while ViT performer takes 23 sec/epoch.
Checking the parameter count also shows ViT-performer has double the size of regular ViT.
I am hoping that someone has intuition about the speed of ViT performer vs regular ViT and their parameter counts.
Thank you very much in advance!
Just found out why model size is twice as big.
feed forward layer has a multiplier of 4 for the dimension, after adding ff_mult=1, it's the same size.
However, performer is still slow compared to the regular ViT using torchvision.datasets.MNIST training set on RTX 3090
Regular ViT:
Average seconds for training 1 epoch: 15.101385951042175
Average seconds for testing: 0.6326647281646729
Performer ViT:
Average seconds for training 1 epoch: 28.795904541015624
Average seconds for testing: 0.9286866903305053