Comparison of learning and inference speed of different gpu with various cnn models in pytorch
- 1080TI
- TITAN V
- 2080TI
Graphics Card Name | NVIDIA GeForce GTX 1080 Ti | NVIDIA GeForce RTX 2080 Ti | NVIDIA TITAN V |
---|---|---|---|
Process | 16nm | 12nm | 12nm |
Die Size | 471mm² | 754mm² | 815mm² |
Transistors | 11,800 million | 18,600 million | 21,100 million |
CUDA Cores | 3584 Cores | 4352 Cores | 5120 Cores |
Tensor Cores | None | 544 Cores | 640 Cores |
Clock(base) | 1481 MHz | 1350 MHz | 1200 MHz |
FP16 (half) performance | 177.2 GFLOPS | 26,895 GFLOPS | 29,798 GFLOPS |
FP32 (float) performance | 11,340 GFLOPS | 13,448 GFLOPS | 14,899 GFLOPS |
FP64 (double) performance | 354.4 GFLOPS | 420.2 GFLOPS | 7,450 GFLOPS |
Memory | 11GB GDDR5X | 11 GB GDDR6 | 12 GB HBM2 |
Memory Speed | 11Gbps | 14.00 Gbps | 1.7Gbps HBM2 |
Memory Interface | 352-bit | 352-bit | 3072-bit |
Memory Bandwidth | 484 GB/s | 616 GB/s | 653GB/s |
Price | $699 US | $1,199 US | $2,999 US |
Release Date | Mar 10th, 2017 | Sep 20th, 2018 | Dec 7th, 2017 |
-
Single & multi GPU with batch size 12: compare training and inference speed of **SequeezeNet, VGG-16, VGG-19, ResNet18, ResNet34, ResNet50, ResNet101, ResNet152, DenseNet121, DenseNet169, DenseNet201, DenseNet161 mobilenet mnasnet ... **
-
Experiments are performed on three types of datatype. single precision, double precision, half precision
-
making plot(plotly)
./test.sh
- torchvision
- torch>=1.0.0
- pandas
- psutil
- plotly(for plot)
- cufflinks(for plot)
- Pytorch version
1.4
- Number of GPUs on current device
4
- CUDA version =
10.0
- CUDNN version=
7601
- 2020/09/01
- Addition result in windows10
- Edit README.md
- 2020/01/17
- Edit coding style and some bug
- Change plot method
- Add results of various model experiments(only 2080ti)
- 2019/01/09
- PR Update typo (thank for johmathe)
- Add requirements.txt
- Add result figures
- Add ('TkAgg') for cli
- Addition Muilt GPUS (DGX-station)
- thanks for olixu
- based on 2020/01/17 update
Each network is fed with 12 images with 224x224x3 dimensions. For training, time durations of 20 passes of forward and backward are averaged. For inference, time durations of 20 passes of forward are averaged. 5 warm up steps are performed that do not calculate towards the final result.
I conducted the experiment using two rtx 2080ti.
Mode | gpu | precision | densenet121 | densenet161 | densenet169 | densenet201 | resnet101 | resnet152 | resnet18 | resnet34 | resnet50 | squeezenet1_0 | squeezenet1_1 | vgg16 | vgg16_bn | vgg19 | vgg19_bn |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Training | TITAN V | single | 56.17 ms | 120.7 ms | 72.59 ms | 93.35 ms | 84.59 ms | 119.5 ms | 16.69 ms | 28.27 ms | 50.54 ms | 15.30 ms | 9.857 ms | 72.85 ms | 80.95 ms | 85.55 ms | 94.42 ms |
Inference | TITAN V | single | 17.49 ms | 39.33 ms | 23.63 ms | 30.93 ms | 23.96 ms | 34.22 ms | 4.827 ms | 8.428 ms | 14.27 ms | 4.565 ms | 2.765 ms | 22.94 ms | 25.41 ms | 27.55 ms | 30.28 ms |
Training | TITAN V | double | 139.8 ms | 387.4 ms | 175.9 ms | 224.5 ms | 509.9 ms | 720.0 ms | 94.21 ms | 194.6 ms | 271.7 ms | 68.38 ms | 31.18 ms | 1463. ms | 1484. ms | 1993. ms | 2016. ms |
Inference | TITAN V | double | 47.68 ms | 170.5 ms | 60.73 ms | 78.43 ms | 317.7 ms | 448.6 ms | 60.26 ms | 129.9 ms | 159.8 ms | 42.37 ms | 11.95 ms | 1261. ms | 1266. ms | 1745. ms | 1751. ms |
Training | TITAN V | half | 43.79 ms | 75.16 ms | 57.53 ms | 70.88 ms | 47.82 ms | 67.43 ms | 10.48 ms | 17.19 ms | 29.08 ms | 13.15 ms | 9.390 ms | 36.03 ms | 46.84 ms | 41.16 ms | 52.65 ms |
Inference | TITAN V | half | 11.87 ms | 22.88 ms | 16.04 ms | 20.70 ms | 12.80 ms | 18.11 ms | 3.085 ms | 5.116 ms | 7.608 ms | 3.694 ms | 2.329 ms | 10.96 ms | 13.26 ms | 12.72 ms | 15.17 ms |
Training | 1080ti | single | 77.18 ms | 164.0 ms | 99.66 ms | 127.6 ms | 112.8 ms | 158.7 ms | 22.48 ms | 36.80 ms | 68.87 ms | 20.56 ms | 13.29 ms | 101.8 ms | 114.1 ms | 119.9 ms | 133.2 ms |
Inference | 1080ti | single | 23.53 ms | 51.53 ms | 31.82 ms | 41.73 ms | 33.02 ms | 47.02 ms | 6.426 ms | 10.97 ms | 20.17 ms | 7.174 ms | 4.370 ms | 33.73 ms | 37.25 ms | 39.95 ms | 44.12 ms |
Training | 1080ti | double | 779.5 ms | 2522. ms | 940.4 ms | 1196. ms | 2410. ms | 3546. ms | 463.3 ms | 969.9 ms | 1216. ms | 259.9 ms | 131.5 ms | 4227. ms | 4271. ms | 5475. ms | 5522. ms |
Inference | 1080ti | double | 47.68 ms | 275.2 ms | 1157. ms | 328.6 ms | 414.9 ms | 1080. ms | 1589. ms | 181.1 ms | 390.8 ms | 529.6 ms | 110.9 ms | 49.96 ms | 2094. ms | 2103. ms | 2775. ms |
Training | 1080ti | half | 43.79 ms | 70.00 ms | 148.4 ms | 89.43 ms | 113.6 ms | 151.0 ms | 219.5 ms | 21.00 ms | 34.84 ms | 76.24 ms | 19.60 ms | 13.18 ms | 91.60 ms | 105.9 ms | 108.1 ms |
Inference | 1080ti | half | 18.62 ms | 42.26 ms | 25.27 ms | 33.01 ms | 27.49 ms | 38.88 ms | 5.645 ms | 9.765 ms | 16.26 ms | 5.869 ms | 3.576 ms | 30.69 ms | 33.22 ms | 36.71 ms | 39.51 ms |
Mode | gpu | precision | resnet18 | resnet34 | resnet50 | resnet101 | resnet152 | densenet121 | densenet169 | densenet201 | densenet161 | squeezenet1_0 | squeezenet1_1 | vgg16 | vgg16_bn | vgg19_bn | vgg19 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Training | RTX 2080ti(1) | single | 16.36 ms | 28.44 ms | 49.63 ms | 81.40 ms | 115.1 ms | 57.69 ms | 75.18 ms | 91.69 ms | 112.7 ms | 14.49 ms | 9.108 ms | 75.86 ms | 85.42 ms | 98.43 ms | 88.05 ms |
Inference | RTX 2080ti(1) | single | 4.894 ms | 8.624 ms | 14.65 ms | 24.57 ms | 35.15 ms | 16.70 ms | 21.94 ms | 28.89 ms | 34.64 ms | 4.704 ms | 2.765 ms | 23.70 ms | 26.25 ms | 30.82 ms | 28.03 ms |
Training | RTX 2080ti(1) | double | 367.9 ms | 755.4 ms | 939.9 ms | 1844. ms | 2702. ms | 593.5 ms | 724.3 ms | 921.3 ms | 1916. ms | 187.8 ms | 94.99 ms | 3251. ms | 3277. ms | 4265. ms | 4238. ms |
Inference | RTX 2080ti(1) | double | 165.0 ms | 328.5 ms | 436.4 ms | 831.0 ms | 1196. ms | 213.8 ms | 266.0 ms | 339.5 ms | 910.7 ms | 82.71 ms | 35.79 ms | 1702. ms | 1708. ms | 2280. ms | 2274. ms |
Training | RTX 2080ti(1) | half | 13.17 ms | 22.25 ms | 35.46 ms | 57.50 ms | 81.38 ms | 51.11 ms | 66.88 ms | 80.20 ms | 88.37 ms | 17.87 ms | 35.75 ms | 53.16 ms | 63.06 ms | 72.75 ms | 61.95 ms |
Inference | RTX 2080ti(1) | half | 3.423 ms | 5.662 ms | 9.035 ms | 14.51 ms | 20.52 ms | 13.47 ms | 17.54 ms | 22.51 ms | 27.10 ms | 4.280 ms | 2.397 ms | 16.14 ms | 18.14 ms | 19.76 ms | 17.89 ms |
Training | RTX 2080ti(2) | single | 16.92 ms | 29.51 ms | 51.46 ms | 84.90 ms | 120.0 ms | 58.13 ms | 75.96 ms | 92.47 ms | 117.6 ms | 14.95 ms | 9.255 ms | 78.95 ms | 88.71 ms | 102.3 ms | 91.67 ms |
Inference | RTX 2080ti(2) | single | 5.107 ms | 8.976 ms | 15.18 ms | 25.60 ms | 36.60 ms | 17.02 ms | 22.40 ms | 29.46 ms | 36.72 ms | 4.852 ms | 2.786 ms | 24.76 ms | 27.25 ms | 32.05 ms | 29.27 ms |
Training | RTX 2080ti(2) | double | 381.9 ms | 781.5 ms | 971.6 ms | 1900. ms | 2777. ms | 610.6 ms | 744.7 ms | 948.1 ms | 1974. ms | 191.9 ms | 97.27 ms | 3317. ms | 3350. ms | 4357. ms | 4329. ms |
Inference | RTX 2080ti(2) | double | 171.8 ms | 341.7 ms | 449.5 ms | 849.5 ms | 1231. ms | 221.1 ms | 275.2 ms | 352.5 ms | 938.9 ms | 83.66 ms | 36.48 ms | 1715. ms | 1721. ms | 2294. ms | 2289. ms |
Training | RTX 2080ti(2) | half | 13.57 ms | 22.97 ms | 36.55 ms | 59.10 ms | 83.81 ms | 51.74 ms | 68.35 ms | 81.21 ms | 89.46 ms | 15.75 ms | 35.46 ms | 55.28 ms | 65.43 ms | 75.75 ms | 64.62 ms |
Inference | RTX 2080ti(2) | half | 3.520 ms | 5.837 ms | 9.272 ms | 14.93 ms | 21.13 ms | 13.38 ms | 18.71 ms | 22.40 ms | |||||||
26.82 ms | 4.446 ms | 2.406 ms | 16.29 ms | 17.91 ms | 20.90 ms | 19.14 ms |
- Results using codes prior to 2020/01/17
If you want to contribute to the experiment in an additional environment, please contribute to result by subfolder in fig.