jcjohnson/cnn-benchmarks

Titan X Pascal on VGG16 much slower on my machine than in benchmark

mbcel opened this issue · 1 comments

mbcel commented

I have a Titan X Pascal, Intel i5-6600, 16GB Ram and running torch7 in Ubuntu 14.04. The Nvidia driver version is 375.20, CUDA Toolkit 8.0 and cuDNN v5.1.

I did the same test with the same VGG16 network from Caffe (imported via loadcaffe) as you did. However, for a forward pass my setup needs 80ms which is double the time as it apparently needs in your benchmark.

I also generated a batch of 16 images with 3 channels and size 224x224. The relevant code is:

local model = loadcaffe.load("/home/.../Models/VGG16/VGG_ILSVRC_16_layers_deploy.prototxt",
                              "/home/.../Models/VGG16/VGG_ILSVRC_16_layers.caffemodel",
                              "cudnn")

for i=1, 50 do
  local input = torch.randn(16, 3, 224, 224):type("torch.CudaTensor")

  cutorch.synchronize()
  local timer = torch.Timer()

  model:forward(input)
  cutorch.synchronize()

  local deltaT = timer:time().real
  print("Forward time: " .. deltaT)
end

The output is:

Forward time: 0.96536016464233
Forward time: 0.10063600540161
Forward time: 0.096444129943848
Forward time: 0.089151859283447
Forward time: 0.082037925720215
Forward time: 0.082045078277588
Forward time: 0.079913139343262
Forward time: 0.080273866653442
Forward time: 0.080694913864136
Forward time: 0.082727193832397
Forward time: 0.082070827484131
Forward time: 0.079407930374146
Forward time: 0.080456018447876
Forward time: 0.083559989929199
Forward time: 0.082060098648071
Forward time: 0.081624984741211
Forward time: 0.080413103103638
Forward time: 0.083755016326904
Forward time: 0.083209037780762
...

Did you do anything additional to get that speed? Or am I doing something wrong here?
Or is it maybe because I am using Ubuntu 14.04 (although your GTX 1080 running on Ubuntu 14.04 also only needs 60ms)

mbcel commented

Okay I solved it!
I had to set the cudnn benchmark flag like so:

cudnn.benchmark = true

Now the forward time is about 39ms.
However, I couldn't find it in your code that you set this flag...