vgg16 benchmark?

Question

vgg16 benchmark?

Closed this issue 8 years ago · 2 comments

So I was trying to look at the speed I get on some nets with TF and pytorch. (maxwell titan GPU)

I was just trying the forward pass, and I get similar results (tf being slightly slower usually) for the resnet-* architectures. But If I try with the VGG16 one, I get something way worse.

Model	your benchmark	pytorch (me)	tf (me)	Reported MatConvnet from here (interpolated, probably Pascal)
resnet-50	55.75	48.3	57.6	~40
vgg16	62.30	113	169	~80

I am a bit surprised all the others have a sharp increase (more than twice slower on average) from resnet-50 to vgg16 but not on your benchmark?.

Answer 1 · 2017-01-25T17:38:35.000Z

My guess is that you don't have the cuDNN autotuner enabled in other frameworks. I'm not too familiar with TensorFlow or MatConvnet, but here is a quick PyTorch benchmarking script for VGG16:

import time
import torch
import torchvision
torch.backends.cudnn.benchmark = True

dtype = torch.cuda.FloatTensor
N, C, H, W = 16, 3, 224, 224

model = torchvision.models.vgg16()
print(model)
model.type(dtype)

times = []
for t in range(10):
  x = torch.randn(N, C, H, W).type(dtype)
  torch.cuda.synchronize()
  t0 = time.time()
  y = model(torch.autograd.Variable(x))
  torch.cuda.synchronize()
  t1 = time.time()
  times.append(t1 - t0)

print(times)

When I run this on my Maxwell Titan X I get:

1.2714393138885498
0.0651240348815918
0.06461572647094727
0.0646982192993164
0.0645449161529541
0.06469154357910156
0.06457901000976562
0.06456184387207031
0.06459999084472656
0.06464290618896484

Which matches my Lua Torch benchmark. However if I disable the cuDNN autotuner by deleting the line

torch.backends.cudnn.benchmark = True

Then I get times that match your results:

0.4282658100128174
0.11162209510803223
0.11102771759033203
0.11126351356506348
0.11090779304504395
0.11112117767333984
0.11123847961425781
0.11114788055419922
0.11153745651245117
0.11166501045227051

Answer 2 · 2017-01-25T17:59:47.000Z

Aaaahhh! Thanks a lot, that was indeed the culprit, I get exactly your results with my script as well.

Though that does not explain the tensorflow slowness because auto-tuning is active by default... But that is unrelated to this repo. Are the torch kernels so much better optimized for 3x3 convolution? I would have expected some very similar performance since cudnn is doing most of the job.

Anyway, thanks for the answer :-)