/TNN

TNN: developed by Tencent Youtu Lab and Guangying Lab, a lightweight and high-performance deep learning framework for mobile inference. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework. TNN:由腾讯优图实验室和光影实验室协同打造,移动端高性能、轻量级推理框架,同时拥有跨平台、高性能、模型压缩、代码裁剪等众多突出优势。TNN框架在原有Rapidnet、ncnn框架的基础上进一步加强了移动端设备的支持以及性能优化,同时也借鉴了业界主流开源框架高性能和良好拓展性的优点。目前TNN已经在手Q、微视、P图等应用中落地,欢迎大家参与协同共建,促进TNN推理框架进一步完善。

Primary LanguageC++OtherNOASSERTION

中文版本

Introduction

TNN is a high-performance and lightweight inference framework for mobile devices. It provides lots of advanced features such as cross-platform, model-compression, and code-pruning. TNN, inspired by mainstream open-source industry frameworks, integrates and leverages Youtu Lab's Rapidnet, ncnn framework. It also combines the efforts of the deep-learning framework Oteam from all departments(PCG, TEG, IEG) to create an enterprise-level mobile inference engine. At present, TNN has been launched to support various products in Youtu Lab and Guangying Studio.

Effect Example

Face Detection(blazeface) Object Detection(yolov5s) Face Alignment
(from Tencent Youtu Lab)

iOS ✅ Android ✅
model link

iOS ✅ Android ✅
model link

iOS ✅ Android ✅
model link
Hair Segmentation
(from Tencent Guangying Lab)
Pose Estimation
(from Tencent Guangliu)
Pose Estimation(blazepose)

iOS ✅ Android ✅
model link

iOS ✅ Android ✅
model link

iOS ✅ Android ✅
model link

Quick Start

It is very simple to use TNN. If you have a trained model, the model can be deployed on the target platform through three steps.

  1. Convert the trained model into a TNN model. We provide a wealth of tools to help you complete this step, whether you are using Tensorflow, Pytorch, or Caffe, you can easily complete the conversion. Detailed hands-on tutorials can be found here How to Create a TNN Model.

  2. When you have finished converting the model, the second step is to compile the TNN engine of the target platform. You can choose among different acceleration solutions such as ARM/OpenCL/Metal/NPU according to the hardware support. For these platforms, TNN provides convenient one-click scripts to compile. For detailed steps, please refer to How to Compile TNN.

  3. The final step is to use the compiled TNN engine for inference. You can make program calls to TNN inside your application. We provide a rich and detailed demo as a reference to help you complete.

Technical Solutions

TNN is a high-performance and lightweight inference framework for mobile devices. It provides lots of advanced features such as cross-platform, model-compression, and code-pruning. TNN, inspired by mainstream open-source industry frameworks, integrates and leverages Youtu Lab's Rapidnet, ncnn framework. It also combines the efforts of the deep-learning framework Oteam from all departments(PCG, TEG, IEG) to create an enterprise-level mobile inference engine. At present, TNN has been launched in various major businesses, and its following characteristics have been widely praised.

  • Computation optimization

    • The backend operators are primely optimized to make the best use of computing power in different architectures, regarding instruction issue, throughput, delay, cache bandwidth, cache delay, registers, etc..
    • The TNN performance on mainstream hardware platforms (CPU: ARMv7, ARMv8, GPU: Mali, Adreno, Apple) has been greatly tuned and improved.
    • The convolution function is implemented by various algorithms such as Winograd, Tile-GEMM, Direct Conv, etc., to ensure efficiency under different parameters and sizes.
    • Op fusion: TNN can do offline analysis of network graph, fuse multiple simple operations and reduce overhead such as redundant memory access and kernel startup cost.
  • Low precision computation acceleration

    • TNN supports INT8/FP16 mode, reduces model size & memory consumption, and utilizes specific hardware low-precision instructions to accelerate calculations.
    • TNN supports INT8 WINOGRAD algorithm, (input 6bit), further reduces the model calculation complexity without sacrificing the accuracy.
    • TNN supports mixed-precision data in one model, speeding up the model's calculation speed while preserving its accuracy.
  • Memory optimization

    • Efficient "memory pool" implementation: Based on a full network DAG analysis, the implementation reuses memory between non-dependent nodes which reduces memory cost by 90%.
    • Cross-model memory reduces: This supports external real-time design for network memory so that multiple models can share mutual memory.
  • The performance of mainstream models on TNN: v0.1 2020.05.29

    • Kirin970:

      model cpu time(single thread, ms) gpu time(ms) npu time(ms)
      Mobilenet_v1 88 12 4.9
      Mobilenet_v1_int8 55
      Mobilenet_v2 58 11 8.0
      Mobilenet_v2_int8 41
      squeezenet_v1.0 127 20 5.1
      squeezenet_v1.0_int8 82
    • Snapdragon 835:

      model cpu time(single thread, ms) gpu time(ms)
      Mobilenet_v1 94 16
      Mobilenet_v1_int8 62
      Mobilenet_v2 61 14
      Mobilenet_v2_int8 47
      squeezenet_v1.0 122 28
      squeezenet_v1.0_int8 93
    • Snapdragon 845:

      model cpu time(single thread, ms) gpu time(ms)
      Mobilenet_v1 60 10
      Mobilenet_v1_int8 37
      Mobilenet_v2 39 8
      Mobilenet_v2_int8 28
      squeezenet_v1.0 74 14
      squeezenet_v1.0_int8 56
  • TNN architecture diagram:

  • TNN supports TensorFlow, Pytorch, MxNet, Caffe, and other training frameworks through ONNX, leveraging the continuous improvement of the ONNX open-source society. Currently, TNN supports 55 ONNX operators and will be developed to cover 80 operators shortly, consisting of most of the mainstream CNN operators needed.

  • TNN runs on mainstream operating systems (Android, iOS, embedded Linux), and is compatible with ARM CPU, GPU hardware platform (Da Vinci NPU will be supported soon)

  • TNN is constructed through Modular Design, which abstracts and isolates components such as model analysis, graph construction, graph optimization, low-level hardware adaptation, and high-performance kernel. It uses "Factory Mode" to register and build devices, that tries to minimize the cost of supporting more hardware and acceleration solutions.

  • TNN's running time does not rely on any third-party libraries. The size of the CPU dynamic library is only around 400KB, and it provides basic image conversion operations, which are light-weight and convenient. TNN uses unified models and interfaces across platforms and can switch easily by configuring just one single parameter.

Learn About TNN Abilities

Manual

API Document

Contribute to TNN

Roadmap

Acknowledgement

TNN referenced the following projects:

License

FAQ

Join Us

  • Everyone is welcome to participate to build the best mobile inference framework in the industry.

  • Technical Discussion QQ Group: 913940506 Answer: TNN

  • Scan the QR code to join the TNN discussion group: