/CMSIS_NN_Fast

A faster cmsis nn implementation optimized for ds-cnn architecture.

Primary LanguageCApache License 2.0Apache-2.0

Introduction

This implementation of CMSIS-NN solved the painfully slow memory copy problem in normal convolution layers and especially depthwise convolution layers. This optimization achieves 2X inference speed for DS-CNN models running on cortex m4 processors.
Please replace the functions in the original CMSIS-NN model and please notice the changes made to the parameters.
More is on the way. A fully automatic Tensorflow to cortex-m4 toolchain for DS-CNN models will be released if my company allows me to do so (apparently not).

Notice

  • Due to different rounding methods, the fucntions provided here may not yield the exact results as the original ones. (e.g. 0xCD and 0xCC)