Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count

Question

Performer Pytorch Slower than Expected and Please Help with Understanding Parameter Count

michaelweihaosong opened this issue 2 years ago · 1 comments

Hi,

First of all, this is a great package from lucidrains and I find it very helpful in my research.

A quick question is that I noticed ViT-performer is slower than the regular ViT from lucidrains. For example running on mnist from pytorch will take 15 sec/epoch for regular ViT with the configuration below while ViT performer takes 23 sec/epoch.

Checking the parameter count also shows ViT-performer has double the size of regular ViT.

I am hoping that someone has intuition about the speed of ViT performer vs regular ViT and their parameter counts.

Thank you very much in advance!

Answer 1 · 2022-12-13T00:20:07.000Z

Just found out why model size is twice as big.

feed forward layer has a multiplier of 4 for the dimension, after adding ff_mult=1, it's the same size.

However, performer is still slow compared to the regular ViT using torchvision.datasets.MNIST training set on RTX 3090

Regular ViT:
Average seconds for training 1 epoch: 15.101385951042175
Average seconds for testing: 0.6326647281646729

Performer ViT:
Average seconds for training 1 epoch: 28.795904541015624
Average seconds for testing: 0.9286866903305053