Tengine, developed by OPEN AI LAB, is a lite, high-performance, and modular inference engine for embedded device.
Tengine is composed of six modules: core/operator/serializer/executor/driver/wrapper.
- core provides the basic components and functionalities of the system.
- operator defines the schema of operators, such as convolution, relu, pooling, etc. al. Here is the current support operator list.
- serializer is to load the saved model. The serializer framework is extensible to support different format, including the customized one. Caffe/ONNX/Tensorflow/MXNet and Tengine models can be loaded directly by Tengine.
- executor implements the code to run graph and operators. Current version provides a highly optimized implementation for multi A72 cores.
- driver is the adapter of real H/W and provides service to device executor by HAL API. It is possible for single driver to create multiple devices.
- wrapper provides the wrapper of APIs for different frameworks. Both Caffe API wrapper and Tensorflow API wrapper work now.
This version can load and run Caffe model of mobilenet and squeezenet directly. For more details, please goto install.
NOTE
: Old Caffe model has to be upgraded using upgrade_net_proto_binary/upgrade_net_proto_binary from Caffe's package.
The data is collected on 1.8G A72 and on chip RK3399, by repeating calling the forward interface to get the average time cost (ms) per run.
- Single A72 core (1xA72)
NN | Caffe(Openblas) | Tengine |
---|---|---|
squeezenet | 147 | 91 |
mobilenet | 306 | 122 |
- Two A72 cores (2xA72)
NN | Caffe(Openblas) | Tengine |
---|---|---|
squeezenet | 102 | 51 |
mobilenet | 232 | 65 |
For details to run benchmark, please visit benchmark page.
please refer to the Linux build and Android build
please visit examples for demos on classification/detection and download models from Tengine model zoo (psw: hhgc)
tengine applications is a project for sharing android/linux applications powered by Tengine
It is easy to add new operator to Tengine. Here is the guide on new operator.
Tengine can be extended to support new serialization format, by building new serializer module.
How to build new serializer module
- Github issues
- QQ group: 829565581 (Question:Tengine Answer:openailab)
tengine model 2.0
New apis
get_graph_node_number() get_graph_node_by_idx()
New features
Separate CPU operator as a independent so: hclcpu.so
Add Reference Operator
Update Testcase & Update permute for mxnet
Update lstm grun mxnet serializer
Support MXNET serializer in CMakelist.txt
Support TFLITE serializer in CMakelist.txt
Support eltwise in TFLITE serializer
More operator support
RNN operator definition and blas implementation
LSTM operator definition and blas implementation
GRU operator definition and blas implementation
tengine API 2.0
New API set for NN inference
Simplify graph create process: just create_graph() instead of load_model() and create_runtime_graph()
Support perf stat and tensor dump
Support log redirect
Support to build Android NN Driver with new Tengine API
By setting CONFIG_LEGACY_API=y in makefile.config, tengine API 1.0 still works
more tensorflow models support
Support inceptionv3/v4, resnet_v2_101, mobilenet v1/v2 models from tensorflow
Support GPU/CPU Heterogeneous Computing
By calling set_graph_device(graph,"acl_opencl"), operators that GPU supports will be scheduled to GPU, while left operators will be scheduled on CPU automatically.
Here is the guide to run a MSSD example with GPU FP16
Using c++_shared for Android build
As NDK toolchains will drop gun_stl finally, this version switches to c++_shared
Please download the pre-built libraries with c++_shared from Tengine Android Build Libraries (password: wtcz).
Support ACL in Android
Update the cmake system to support ACL in Android build. please refer to Android build guide
Bugfix
The issue to load tengine model converted from MXNet
Serializer:
update ONNX module with new onnx proto version
New features
Serializer: support saving model as c files
ACL GPU: add FP16 support
NN: mobilenet v2 support in examples
Accuracy tools: yolov2 accuracy test
Build:
support cross-building arm32 library
support building on raspberry pi 3b
automatically clean the build directory when makfile.config changed
Bug fix
A few memory leakage issues in library and examples
A race condition issue between front thread and the background working thread
Tensorflow serializer issue: fail to load inception_v3 model
Support Tengine model file. protobuf is optional now.
Please refer to tengine_model exmaples
New features
Support GPU: using ACL (Arm computing library) as a backend graph device
Support blas operator implementation: Tengine can run on x86 without caffe now
Support new NN: Inception-v3/vgg16/faster-rcnn/ssd/yolo-v2
Support Android build: includes 32bit and 64bit
Support cross-compile on x86 (experimental): debian example contributed by mcharleb and Mani-Sadhasivam @ Linaro
Support Tensorflow serializer: load inception-v3 and mobilenet TF model directly
Support Tensorflow wrapper: label_image.cpp from tensorflow repo
Others
Single so file now and remove the etc/config according to feedback from field.
Tengine will automatically probe the CPU arch/part settings, and there is just one CPU driver now.
To assign cpu manually when necessary:
export TENGINE_CPU_LIST=1,2
Besides probing CPU, a few CPUs are defined in cpu_predefined.cpp, including rk3399/a63/kirin960/apq8096. To use the predefined CPU, refers to below:
const struct cpu_info * p_info=get_predefined_cpu("rk3399");
create_cpu_device("rk3399",p_info);
Introduce the driver/device model to support MT(Multi-Thread)
Support new NN: Inception-v4
Caffe Wrapper examples: squeezenet/mobilenet/mtcnn
MXNet model load examples: squeezenet/mobilenet
Support new operator: Eltwise, PReLU, Slice
Support new NN: mtcnn, resnet and lighten_cnn
Experimental caffe API wrapper: caffe based application just needs to recompile to use Tengine
Update documents, as well a few fixes.
Initial release of single A72 support