lucidrains/performer-pytorch

Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count

michaelweihaosong opened this issue · 1 comments

Hi,

First of all, this is a great package from lucidrains and I find it very helpful in my research.

A quick question is that I noticed ViT-performer is slower than the regular ViT from lucidrains. For example running on mnist from pytorch will take 15 sec/epoch for regular ViT with the configuration below while ViT performer takes 23 sec/epoch.

Checking the parameter count also shows ViT-performer has double the size of regular ViT.

Screen Shot 2022-12-12 at 11 32 41 PM

Screen Shot 2022-12-12 at 11 28 50 PM

I am hoping that someone has intuition about the speed of ViT performer vs regular ViT and their parameter counts.

Thank you very much in advance!

Just found out why model size is twice as big.

feed forward layer has a multiplier of 4 for the dimension, after adding ff_mult=1, it's the same size.

Screen Shot 2022-12-13 at 12 15 29 AM

However, performer is still slow compared to the regular ViT using torchvision.datasets.MNIST training set on RTX 3090

Regular ViT:
Average seconds for training 1 epoch: 15.101385951042175
Average seconds for testing: 0.6326647281646729

Performer ViT:
Average seconds for training 1 epoch: 28.795904541015624
Average seconds for testing: 0.9286866903305053