jcjohnson/cnn-benchmarks

openCL branch of caffe reports much higher speeds

Motherboard opened this issue · 3 comments

on OpenCL-caffe, there are performance matrices claiming speeds of about 4ms per image for training AlexNet with Radeon R290X, Considering this GPU is much weaker than a GTX 1080, these figures seem very weird compared with the 20ms in your tests.

What's your take on this?

My speeds are forward / backward times for an entire minibatch of 16 images; they divide by the minibatch size to try and compute a per-image time. You need to divide my times by 16 to be comparable to theirs, at which point the GTX 1080 is significantly faster than their R290X times.

Another subtle issue is that they use a minibatch size of 128, while I used a minibatch size of 16 for a fair comparison across all models. Since AlexNet is a small model and GPUs are massively parallel, I'd expect the per-image time to decrease as the batch size increases, which gives their benchmark a slight advantage.

Thanks, this clears it up :) On an unrelated note, I'd really love to see benchmarks of SqueezeNet 1.1, which should be much faster than all of these networks.

Hello,
First of all its great work now i am able to understand torch framework and benchmark till some extend.
Im trying to see the difference between running on gpu and cpu.
I built the torch and my desktop spec are intel i7 with Pascal Titan-X.
I was able to run on GPU but as far as running on CPU when i issue the command
python run_cnn_benchmarks.py --gpus -1 --models Torch_Ref/distro/cnn_bench/cnn-benchmarks/models/alexnet/alexnet.t7 --batch_sizes 1 --use_cudnn 1

Im getting following error "
re/lua/5.2/torch/File.lua:343: unknown Torch class <cudnn.SpatialConvolution>"

May i missing something here while running on CPU only.