Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below.
Work in progress! I am still working through each convolution module in each library, THIS IS NOT AN EXHAUSTIVE LIST!
- After getting an initial baseline with the single module below (and getting inital benchmark scripts), I will benchmark a full AlexNet/MattNet/Overfeat
Machine: 6-core Intel i7-3930K @ 3.20GHz
+ NVIDIA Titan Black
+ Ubuntu 14.04 x86_64
###Spatial Convolution layer (3D input 3D output) #####:forward() Columns L1, L2, L3, L4, L5, Total are times in milliseconds
Original Library | Class/Function Benchmarked | Device | L1 | L2 | L3 | L4 | L5 | Total |
---|---|---|---|---|---|---|---|---|
Theano (experimental)*** | pylearn2.mlp.ConvElemwise | GPU | 205 | 75 | 28 | 9 | 5 | 322 |
cuda-convnet2 * | ConvLayer | GPU | 69 | 242 | 87 | 9 | 17 | 424 |
Caffe | ConvolutionLayer<Dtype> | GPU | 102 | 203 | 158 | 39 | 52 | 554 |
Torch-7 | nn.SpatialConvolutionMM | GPU | 105 | 240 | 168 | 41 | 55 | 609 |
cuda-convnet** | pylearn2.cuda_convnet | GPU | 98 | 404 | 149 | 16 | 38 | 705 |
ccv | ccv_convnet_layer | GPU | 121 | 437 | 182 | 23 | 44 | 809 |
Theano (legacy)** | pylearn2.mlp.ConvElemwise | GPU | 418 | 2299 | 672 | 88 | 272 | 3749 |
- * indicates that the library was tested with Torch bindings of the specific kernels.
- ** indicates that the library was tested with Pylearn2 bindings.
- *** This is an experimental module which used FFT to calculate convolutions. It uses a lot of memory according to @benanne
- L1 - Input:
128x128
Batch-size128
, Feature maps:3->96
, Kernel Size:11x11
, Stride:1x1
- L2 - Input:
64x64
Batch-size128
, Feature maps:64->128
, Kernel Size:9x9
, Stride:1x1
- L3 - Input:
32x32
Batch-size128
, Feature maps:128->128
, Kernel Size:9x9
, Stride:1x1
- L4 - Input:
16x16
Batch-size128
, Feature maps:128->128
, Kernel Size:7x7
, Stride:1x1
- L5 - Input:
13x13
Batch-size128
, Feature maps:384->384
, Kernel Size:3x3
, Stride:1x1
- The table is ranked according to the total time (L1 + L2 + L3 + L4 + L5)