Investigate SWISH as activation function in cuDNN
kblomdahl opened this issue · 0 comments
kblomdahl commented
In cuDNN 8.2 the swish activation function was introduced, this is an activation function that has been very successfully applied in networks such as MobileNetV3 and EfficientNet. It is worth investigating locally as well to see if it yields better performance than rectified linear units.
Performance
When plugging the swish activation function into the cudnn_types.rs
benchmark (replacing relu), we some unexpected results that probably indicates that something is broken :)
Further investigations using nvprof
indicates that no kernels were launched in the swish examples, so it is not quite a viable candidate yet unless we want to use the backend API.
Relu
test f16_nchw_compute_type_f16 ... bench: 81,782 ns/iter (+/- 17,856)
test f16_nchw_compute_type_f32 ... bench: 153,531 ns/iter (+/- 47,599)
test f16_nhwc_compute_type_f16 ... bench: 97,342 ns/iter (+/- 1,885)
test f16_nhwc_compute_type_f32 ... bench: 120,626 ns/iter (+/- 665)
test i8x32_nhwcvectc ... bench: 84,497 ns/iter (+/- 1,287)
test i8x32_nhwcvectc_noreorder ... bench: 81,282 ns/iter (+/- 502)
Swish
running 2 tests
test f16_nchw_compute_type_f16 ... bench: 20,640 ns/iter (+/- 711)
test f16_nchw_compute_type_f32 ... bench: 20,627 ns/iter (+/- 181)
test f16_nhwc_compute_type_f16 ... bench: 20,473 ns/iter (+/- 319)
test f16_nhwc_compute_type_f32 ... bench: 20,703 ns/iter (+/- 1,207)
test i8x32_nhwcvectc ... bench: 20,280 ns/iter (+/- 77)
test i8x32_nhwcvectc_noreorder ... bench: 20,274 ns/iter (+/- 124)