/awesome-approximate-dnn

Curated content for DNN approximation, acceleration ... with a focus on hardware accelerator and deployment

Approximate Computing in Deep Neural Networks

Hoping to give a clear view on the subject with curated contents organized

From algorithm to hardware execution

Table of contents generated with markdown-toc

Lexical

  • PTQ: Post Training Quantization
  • QAT: Quantization Aware Training

Best Surveys

Tools

Approximations Frameworks

Name Description Framework Supported Approx
NEMO small library for minimization of DNNs intended for ultra low power devices like pulp-nn PyTorch, ONNX PTQ, QAT
Microsoft NNI lightweight toolkit for Feature Engineering, Neural Architecture Search, Hyperparameter Tuning and Model Compression Pytorch, Tensorflow (+Keras), MXnet, Caffe2 CNTK, Theano Pruning / PTQ)
PocketFlow open-source framework for compressing and accelerating DNNs. Tensorflow PTQ, QAT, Prunning
Tensorflow Model Optimization Toolkit to optimize ML / DNN model Tenforflow(Keras) Clustering, Quantization (PTQ, QAT), Pruning
QKeras quantization extension to Keras that provides drop-in replacement for some of the Keras layers Tensorflow(Keras) Quantization (QAT)
Brevitas Pytorch extension to quantize DNN model Pytorch PTQ, QAT
TFApprox Add ApproxConv layers to TF to emulate the use of approximated multipliers on GPU, typically from EvoApproxLib Tensorflow Approximate Multipliers
N2D2 Toolset to import or train model, apply quantization, and export in various format (C/C++ ...) ONNX QAT(license required), PTQ
Distiller Distiller is an open-source Python package for neural network compression research (fine-tuning capable) Pytorch Pruning, Quantization (QAT), Knowledge Distillation, Conditional Computation, Regularization
Adapt AdaPT is a fast emulation framework that extends PyTorch to support approximate inference as well as approximation-aware retraining Pytorch Approximate Multipliers
Intel Neural Compressor INC is an open-source Python lib for neural network compression TensorFlow, PyTorch, ONNX Runtime, MXNet Pruning (Magnitude, Grad), Quantization (PQT, dynamic, QAT, Mix precision), Knowledge Distillation
Qualcomm AIMET AIMET is an open-source lib for trained neural network quantization and compression + Model Zoo TensorFlow, PyTorch Pruning (Channel), Spatial SVD, per-layer compression ratio selection, Quantization (PQT, QAT, Simulation, Rounding, Bias correction, Cross layer equalization, Mix precision)
OpenMMRazor MMRazor is an open-source toolkit for model slimming and AutoML OpenMM Neural Architecture Search (NAS), Pruning, Knowledge Distillation (KD), Quantization (in the next release)

Dedicated Library

  • PULP-NN code, paper - QNN inference library for ultra low power PULP RiscV core

Graph Compiler

  • DORY - automatic tool to deploy DNNs on low-cost MCUs with typically less than 1MB of on-chip SRAM memory
  • Glow - Glow is a machine learning compiler and execution engine for hardware accelerators (Pytorch, ONNX)
  • TensorflowLite - TensorFlow Lite is a set of tools to help developers run TensorFlow models on mobile, embedded, and IoT devices. It enables on-device machine learning inference with low latency and a small binary size (linux, android, mcu). curated content for tflite
  • OpenVino - OpenCL based graph compiler for intel environnment (Intel CPU, Intel GPU, Dedicated accelerator)
  • N2D2 - Framework capable of training and exporting DNN in different format, particulary standalone C/C++ compilable project with very few dependencis and quantized, support import from ONNX model
  • Vitis AI - Optimal Artificial Intelligence Inference from Edge to Cloud (compiler / optimizer / quantizer / profiler / IP set)
  • OnnxRuntime Graph optim - Optimize onnx graph (simplification)

Commercial Dedicated HW accelerator (ASIC)

Name Description Environment Perf
Esperanto ET-soc-1 1000+ low power risc v core chip energy efficient processing of ML/DNN Cloud 800 TOPS @ 20W
Google TPU Processing unit for DNN workload, efficient systolic array for computation Cloud, Edge V4 - 275 TFLOPS @ 200W / V3 - 90 TOPS @250W / Coral Edge 4TOPS @ 2W
Greenwave GAP8 multi-GOPS fully programmable RISC-V IoT-edge computing engine, featuring a 8-core cluster with CNN accelerator, coupled with an ultra-low power MCU with 30 μW state-retentive sleep power (75mW) Edge 600 GMAC/s/W
Intel Movidius Myriad Vector processing unit for accelerating DNN inference, Interface with the OpenVino toolkit, 16 programmable cores Edge 1 TOPS @ 1.5W - 2.67 TOPS/W
Synaptic NPU VIP9000 Nerural processing unit for accelerating DNN inference, 22 NN core (Conv) and 8 Tensor Core, support Bfloat16 Edge 6,75 TOPS @ ? W
Sima ML accelerator MLSoC SoC for accelerating DNN inference (PCIe/SPI/I2C...), support int8 Edge/Cloud 50 TOPS @ 5 W
Moffett Antoum SoC for accelerating SPARSE CV/LLM DNNs inference Cloud 29.5 TOPS / 3.7 TFLOPS @ 70 W
IBM NorthPole NPU for DNNs inference, Vector Matrix Multiplication (VMM) + 2xNoC, int 4,8,16 Cloud -

FPGA based accelerator / HLS for CNNs

  • Maestro - open-source tool for modeling and evaluating the performance and energy-efficiency of different dataflows for DNNs
  • HLS4ML - package for creating HLS from various ML framework (good pytorch support), create streamline architecture
  • FINN - framework for creating HW accelerator (HLS code) from BREVITAS quantized model, downto BNN, create PE architecture
  • N2D2 - framework for creating HLS from N2D2 trained model (support ONNX import), create streamline architecture
  • ScaleHLS - HLS framework on MLIR. Can compile HLS C/C++ or ONNX model to optimized HLS C/C++ in order to generate high-efficiency RTL design using downstream tools, such as Vivado HLS. Focus on scalability, automated DSE engine.

Evaluation Frameworks

  • DNN-Neurosim - Framework for evaluating the performance of inference or training of on-chip DNN

Simulation Frameworks

  • SCALE-Sim - ARM CNN accelerator simulator, that provides cycle-accurate timing, power/energy, memory bandwidth and trace results for a specified accelerator configuration and neural network architecture.
  • Eyeriss Energy Estimator - Energy Estimator for MIT's Eyeriss Hardware Accelerator
  • Torchbench - collection of deep learning benchmarks you can use to benchmark your models, optimized for the PyTorch framework.
  • Renode - Functional simulation platform for MCU dev & test (single and multi-node)

Approximation Methods

Multi-techniques

Pruning

Structured - Hardware Friendly Structure

Weight Saliency Determination

Data-free methods

Quantization

Approximate operators

Others

Contests

Model ZOO

  • TIMM - Excellent model zoo & training scripts for pytorch
  • ONNX Model Zoo - Collection of pre-trained onnx models
  • Tensorflow Hub - pre-trained model that can be imported as keras layers for deployment / fine-tuning
  • Keras Applications - pre-trained popular CNNs implemented in Keras - can be customized and fine tuned
  • Torchvision - The torch equivalent to keras applications
  • Openvino pre-trained models - Intel pre-trained model for use in OpenVino

Generic DSE Framework

  • Google OR-Tools - Constraint programming, routing and other optimization tools
  • Facebook Botorch - Bayesian optimization accelerated by torch backend, python API
  • Pymoo - collection of multi-objective optimization implementation in python, user friendly interface

DNN conversion framework

  • MMdnn - Microsoft tool for cross-framework conversion, retraining, visualization & deployment
  • ONNX - model format to exchange frozen models between ML frameworks

Visualization Framework

  • Tensorboard - Visualization tool for Tensorflow, Pytorch ..., can show graph, metric evolution over training ... very adaptable
  • Netron - Tool to show ONNX graph with all the attributes.
  • mlflow - very flexible simulation logging tool (client/server) allowing to log parameter & metrics + object storage, python and shell interfaces

HLS Framework

Efficient DNN Architecture

  • Blog post - related to recent mobile architectures

Similar repos