/pytorch-apex-experiment

Simple experiment of Apex (A PyTorch Extension)

Primary LanguagePythonMIT LicenseMIT

pytorch-apex-experiment

Simple experiment of Apex: PyTorch Extension with Tools to Realize the Power of Tensor Cores

Usage

1.Install Apex package

Apex: A PyTorch Extension

2.Train

python CIFAR.py --GPU gpu_name --mode 'FP16' --batch_size 128 --iteration 100

3.plot (optional)

python make_plot.py --GPU 'gpu_name1' 'gpu_name2' 'gpu_name3' --method 'FP32' 'FP16' 'amp' --batch 128 256 512 1024 2048

Folder structure

The following shows basic folder structure.

├── cifar
├── CIFAR.py  # training code
├── utils.py
├── make_plot.py
└── results
    └── gpu_name  # results to be saved here

Experiment settings

  • Network: vgg16
  • Dataset: CIFAR10
  • Method: FP32 (float32), FP16 (float16; half tensor), AMP (Automatic Mixed Precision)
  • GPU: GTX 1080 Ti, GTX TITAN X, Tesla V100
  • Batch size: 128, 256, 512, 1024, 2048
  • All random seeds are fixed
  • Result: The mean and std of 5 times (each 100 iterations)
  • Ubuntu 16.04
  • Python 3
  • Cuda 9.0
  • PyTorch 0.4.1
  • torchvision 0.2.1

Resutls

GPU - Method Metric Batch size
128 256 512 1024 2048
1080 Ti - FP32 Accuracy (%) 40.92 ± 2.08 50.74 ± 3.64 61.32 ± 2.43 64.79 ± 1.56 63.44 ± 1.76
Time (sec) 5.16 ± 0.73 9.12 ± 1.20 16.75 ± 2.05 32.23 ± 3.23 63.42 ± 4.89
Memory (Mb) 1557.00 ± 0.00 2053.00 ± 0.00 2999.00 ± 0.00 4995.00 ± 0.00 8763.00 ± 0.00
1080 Ti - FP16 Accuracy (%) 43.35 ± 2.04 51.00 ± 3.75 57.70 ± 1.58 63.79 ± 3.95 62.64 ± 1.91
Time (sec) 5.42 ± 0.71 9.11 ± 1.14 16.54 ± 1.78 31.49 ± 3.01 61.79 ± 5.15
Memory (Mb) 1405.00 ± 0.00 1745.00 ± 0.00 2661.00 ± 0.00 4013.00 ± 0.00 6931.00 ± 0.00
1080 Ti - AMP Accuracy (%) 41.11 ± 1.19 47.59 ± 1.79 60.37 ± 2.48 63.31 ± 1.92 63.41 ± 3.75
Time (sec) 6.32 ± 0.70 10.70 ± 1.11 18.95 ± 1.80 36.15 ± 3.01 72.64 ± 5.11
Memory (Mb) 1941.00 ± 317.97 1907.00 ± 179.63 2371.00 ± 0.00 4073.00 ± 0.00 7087.00 ± 0.00
TITAN X - FP32 Accuracy (%) 42.90 ± 2.42 45.78 ± 1.22 60.88 ± 1.78 64.22 ± 2.62 63.79 ± 1.62
Time (sec) 5.86 ± 0.80 9.59 ± 1.29 18.19 ± 1.84 35.62 ± 4.07 66.56 ± 4.62
Memory (Mb) 1445.00 ± 0.00 1879.00 ± 0.00 2683.00 ± 0.00 4439.00 ± 0.00 7695.00 ± 0.00
TITAN X - FP16 Accuracy (%) 39.13 ± 3.56 49.87 ± 2.42 59.77 ± 1.77 65.57 ± 2.82 64.08 ± 1.80
Time (sec) 5.66 ± 0.97 9.72 ± 1.23 17.14 ± 1.82 33.23 ± 3.50 65.86 ± 4.94
Memory (Mb) 1361.00 ± 0.00 1807.00 ± 0.00 2233.00 ± 0.00 3171.00 ± 0.00 5535.00 ± 0.00
TITAN X - AMP Accuracy (%) 42.57 ± 1.25 49.59 ± 2.14 59.76 ± 1.60 63.76 ± 4.24 65.14 ± 2.93
Time (sec) 7.55 ± 1.03 11.82 ± 1.07 20.96 ± 1.83 38.82 ± 3.17 76.54 ± 6.60
Memory (Mb) 1729.00 ± 219.51 1999.00 ± 146.97 2327.00 ± 0.00 3453.00 ± 0.00 5917.00 ± 0.00
V100 - FP32 Accuracy (%) 42.56 ± 1.37 49.50 ± 1.81 60.91 ± 0.88 65.26 ± 1.76 63.93 ± 3.69
Time (sec) 3.93 ± 0.54 6.90 ± 0.82 12.97 ± 1.27 25.11 ± 1.83 49.43 ± 3.46
Memory (Mb) 1834.00 ± 0.00 2214.00 ± 0.00 2983.60 ± 116.80 4674.00 ± 304.00 8534.80 ± 826.40
V100 - FP16 Accuracy (%) 43.37 ± 2.13 51.78 ± 2.48 58.46 ± 1.81 64.72 ± 2.37 63.21 ± 1.60
Time (sec) 3.28 ± 0.52 5.95 ± 1.03 10.50 ± 1.27 19.65 ± 1.95 37.32 ± 3.73
Memory (Mb) 1777.20 ± 25.60 2040.00 ± 0.00 2464.00 ± 0.00 3394.00 ± 0.00 4748.00 ± 0.00
V100 - AMP Accracy (%) 42.39 ± 2.35 51.33 ± 1.84 61.41 ± 2.10 65.05 ± 3.29 61.67 ± 3.13
Time (sec) 4.27 ± 0.54 7.18 ± 0.90 13.31 ± 1.26 23.99 ± 2.29 45.68 ± 3.77
Memory (Mb) 2174.80 ± 211.74 2274.00 ± 172.15 2775.20 ± 77.60 3790.80 ± 154.40 5424.00 ± 0.00

Visualization

Time Memory
Time with std Memory with std