jcjohnson/cnn-benchmarks

Pascal architecture is much slower than Maxwell...

Opened this issue · 3 comments

Hi Justin,

In fact Pascal architecture is much slower than Maxwell. Could you please look at my benchmarks down below? The point is to measure old architecture GPUs with optimized DNN libraries for that specific architecture - CUDA 7.5 cuDNN v4.0.

Measured in CNTK, Caffe under Windows 7 and Ubuntu OS.

GTX 980 Ti - CUDA 7.5 with cuDNN 4.0
Caffe Performance: 5242 imgs/s

GTX 980 Ti - CUDA 8.0 RC with cuDNN 5.1
Caffe Performance: 4183 imgs/s

GTX 1080 - CUDA 7.5 with cuDNN 4.0
Caffe Performance: Not Applicable

GTX 1080 - CUDA 8.0 RC with cuDNN 5.1
Caffe Performance: 4628 imgs/s

Best Regards,
Ondrej

There is something wrong with these numbers - the 980 Ti should be roughly the same speed as a Maxwell Titan X, and should thus be quite a bit slower than a 1080. How did you generate these numbers?

@justin: These numbers can be calculated from log files which are generated by Caffe and CNTK frameworks during training process. For benchmarking purposes I've used standard AlexNet model architecture with modified dimensions of input images (dim: 3 dim: 101 dim: 101). You are right that GTX 1080 should be faster than 980 Ti, but before I bought GTX 1080 I have used for a while GTX 980 Ti with Caffe and CNTK and I'm pretty familiar with its performance (Caffe was compiled with CUDA 7.5 and cuDNN 4.0). Since I bought GTX 1080 I was forced to move to CUDA 8.0RC and cuDNN 5.x and recompile Caffe, because only these libraries support Pascal architecture. To my big surprise GTX 1080 was slower than GTX 980 Ti (with CUDA 7.5 and cuDNN 4.0). BUT when I benchmarked Caffe compiled with CUDA 8RC and cuDNN 5.x on GTX 980 Ti it was actually slower than GTX 1080. I obtained similar results with CNTK framework.
So my understanding is that CUDA 8.0RC and cuDNN 5.x libraries are currently just not properly optimized for both architectures - Maxwell and Pascal as well.
Would you be so kind and run your benchmarks again with GeForce GTX Titan X card (Maxwell architecture), but compiled with CUDA 7.5 and cuDNN 4.0? If my benchmarks are correct, you should get much better results. I can guess that old Titan X will perform better than GTX 1080.

Thank you very much.

Best regards,
Ondrej

Check out my neural-style speed benchmarks:

https://github.com/jcjohnson/neural-style#speed

The Maxwell Titan X benchmarks used CUDA 7.0 and cuDNN 4 and the Pascal Titan X benchmarks used CUDA 8.0 RC and cuDNN 5; the Pascal Titan X is almost 2x faster.