Approximate Computing in Deep Neural Networks

Hoping to give a clear view on the subject with curated contents organized

From algorithm to hardware execution

Approximate Computing in Deep Neural Networks

Table of contents generated with markdown-toc

Lexical

PTQ: Post Training Quantization
QAT: Quantization Aware Training

Best Surveys

2022 Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey, Armeniakos & al.
2019 Deep Neural Network Approximation for Custom Hardware:Where We’ve Been, Where We’re Going, Wang & al.
2017 Efficient Processing of Deep Neural Networks: A Tutorial and Survey, Sze & al.
2019 Recent Advances in Convolutional Neural Network Acceleration, Qianru Zhang, & al.
2020 Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey, Deng & al.
2020 Approximation Computing Techniques to Accelerate CNN Based Image Processing Applications – A Survey in Hardware/Software Perspective, Manikandan & al.
2021 Pruning and Quantization for Deep Neural Network Acceleration: A Survey, Liang & al.

Tools

Approximations Frameworks

Name	Description	Framework	Supported Approx
NEMO	small library for minimization of DNNs intended for ultra low power devices like pulp-nn	PyTorch, ONNX	PTQ, QAT
Microsoft NNI	lightweight toolkit for Feature Engineering, Neural Architecture Search, Hyperparameter Tuning and Model Compression	Pytorch, Tensorflow (+Keras), MXnet, Caffe2 CNTK, Theano	Pruning / PTQ)
PocketFlow	open-source framework for compressing and accelerating DNNs.	Tensorflow	PTQ, QAT, Prunning
Tensorflow Model Optimization	Toolkit to optimize ML / DNN model	Tenforflow(Keras)	Clustering, Quantization (PTQ, QAT), Pruning
QKeras	quantization extension to Keras that provides drop-in replacement for some of the Keras layers	Tensorflow(Keras)	Quantization (QAT)
Brevitas	Pytorch extension to quantize DNN model	Pytorch	PTQ, QAT
TFApprox	Add ApproxConv layers to TF to emulate the use of approximated multipliers on GPU, typically from EvoApproxLib	Tensorflow	Approximate Multipliers
N2D2	Toolset to import or train model, apply quantization, and export in various format (C/C++ ...)	ONNX	QAT(license required), PTQ
Distiller	Distiller is an open-source Python package for neural network compression research (fine-tuning capable)	Pytorch	Pruning, Quantization (QAT), Knowledge Distillation, Conditional Computation, Regularization
Adapt	AdaPT is a fast emulation framework that extends PyTorch to support approximate inference as well as approximation-aware retraining	Pytorch	Approximate Multipliers
Intel Neural Compressor	INC is an open-source Python lib for neural network compression	TensorFlow, PyTorch, ONNX Runtime, MXNet	Pruning (Magnitude, Grad), Quantization (PQT, dynamic, QAT, Mix precision), Knowledge Distillation
Qualcomm AIMET	AIMET is an open-source lib for trained neural network quantization and compression + Model Zoo	TensorFlow, PyTorch	Pruning (Channel), Spatial SVD, per-layer compression ratio selection, Quantization (PQT, QAT, Simulation, Rounding, Bias correction, Cross layer equalization, Mix precision)
OpenMMRazor	MMRazor is an open-source toolkit for model slimming and AutoML	OpenMM	Neural Architecture Search (NAS), Pruning, Knowledge Distillation (KD), Quantization (in the next release)

Dedicated Library

PULP-NN code, paper - QNN inference library for ultra low power PULP RiscV core

Graph Compiler

DORY - automatic tool to deploy DNNs on low-cost MCUs with typically less than 1MB of on-chip SRAM memory
Glow - Glow is a machine learning compiler and execution engine for hardware accelerators (Pytorch, ONNX)
TensorflowLite - TensorFlow Lite is a set of tools to help developers run TensorFlow models on mobile, embedded, and IoT devices. It enables on-device machine learning inference with low latency and a small binary size (linux, android, mcu). curated content for tflite
OpenVino - OpenCL based graph compiler for intel environnment (Intel CPU, Intel GPU, Dedicated accelerator)
N2D2 - Framework capable of training and exporting DNN in different format, particulary standalone C/C++ compilable project with very few dependencis and quantized, support import from ONNX model
Vitis AI - Optimal Artificial Intelligence Inference from Edge to Cloud (compiler / optimizer / quantizer / profiler / IP set)
OnnxRuntime Graph optim - Optimize onnx graph (simplification)

Commercial Dedicated HW accelerator (ASIC)

Name	Description	Environment	Perf
Esperanto ET-soc-1	1000+ low power risc v core chip energy efficient processing of ML/DNN	Cloud	800 TOPS @ 20W
Google TPU	Processing unit for DNN workload, efficient systolic array for computation	Cloud, Edge	V4 - 275 TFLOPS @ 200W / V3 - 90 TOPS @250W / Coral Edge 4TOPS @ 2W
Greenwave GAP8	multi-GOPS fully programmable RISC-V IoT-edge computing engine, featuring a 8-core cluster with CNN accelerator, coupled with an ultra-low power MCU with 30 μW state-retentive sleep power (75mW)	Edge	600 GMAC/s/W
Intel Movidius Myriad	Vector processing unit for accelerating DNN inference, Interface with the OpenVino toolkit, 16 programmable cores	Edge	1 TOPS @ 1.5W - 2.67 TOPS/W
Synaptic NPU VIP9000	Nerural processing unit for accelerating DNN inference, 22 NN core (Conv) and 8 Tensor Core, support Bfloat16	Edge	6,75 TOPS @ ? W
Sima ML accelerator MLSoC	SoC for accelerating DNN inference (PCIe/SPI/I2C...), support int8	Edge/Cloud	50 TOPS @ 5 W
Moffett Antoum	SoC for accelerating SPARSE CV/LLM DNNs inference	Cloud	29.5 TOPS / 3.7 TFLOPS @ 70 W
IBM NorthPole	NPU for DNNs inference, Vector Matrix Multiplication (VMM) + 2xNoC, int 4,8,16	Cloud	-

FPGA based accelerator / HLS for CNNs

Maestro - open-source tool for modeling and evaluating the performance and energy-efficiency of different dataflows for DNNs
HLS4ML - package for creating HLS from various ML framework (good pytorch support), create streamline architecture
FINN - framework for creating HW accelerator (HLS code) from BREVITAS quantized model, downto BNN, create PE architecture
N2D2 - framework for creating HLS from N2D2 trained model (support ONNX import), create streamline architecture
ScaleHLS - HLS framework on MLIR. Can compile HLS C/C++ or ONNX model to optimized HLS C/C++ in order to generate high-efficiency RTL design using downstream tools, such as Vivado HLS. Focus on scalability, automated DSE engine.

Evaluation Frameworks

DNN-Neurosim - Framework for evaluating the performance of inference or training of on-chip DNN

Simulation Frameworks

SCALE-Sim - ARM CNN accelerator simulator, that provides cycle-accurate timing, power/energy, memory bandwidth and trace results for a specified accelerator configuration and neural network architecture.
Eyeriss Energy Estimator - Energy Estimator for MIT's Eyeriss Hardware Accelerator
Torchbench - collection of deep learning benchmarks you can use to benchmark your models, optimized for the PyTorch framework.
Renode - Functional simulation platform for MCU dev & test (single and multi-node)

Approximation Methods

Multi-techniques

2022 Cross-Layer Approximation for Printed Machine Learning Circuits (code), - Algorithmic and logic level approximation (coefficient replacement + netlist pruning) through a full DSE for printed ML applications.
2020 Deep Neural Network Compression by In-Parallel Pruning-Quantization - Use Bayesian optimization to solve both pruning and quantization problems jointly and with fine-tuning.
2020 OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization - Analytical single shot compression (Pruning + Quantization) of DNN using only pretrained weights values, then fine-tuning to recover ACL

Pruning

Structured - Hardware Friendly Structure

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity - Large matrix multiplication are tiled, this method propose to maintain a regular pattern at the tile level, improving efficiency.

Weight Saliency Determination

2020 Utilizing Explainable AI for Quantization and Pruning of Deep Neural Networks - Using DeepLift (explainable AI) as hints to improve compression by determining importance of neurons and features

Data-free methods

2021 Post-training deep neural network pruning via layer-wise calibration - Layer-wise sparse pruning calibration based on the use of fractal images to replace representative data, post quantization, achieving 2x compression.

Quantization

2018 Learning Compression from Limited Unlabeled Data - Use unlabelled data to improve accuracy of quantization in a very fast fine-tuning step
2020 Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors - AutoQKeras, Per layer quantization optimization using meta-heuristic DSE based on Bayesian Optimization, make use of Qkeras & hls4ml.

Approximate operators

2020 Full Approximation of Deep Neural Networks through Efficient Optimization - Select efficient approx multipliers through retraining and minimization of accuracy loss (Evo Approx)
2019 ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining - Use NSGA II to optimize approximate multipliers implemented & DNN mapping onto implemented Ax multipliers (Evo Approx).

Others

Contests

MLPerf / MLCommons - Acceleration contest for ML
Papers with Code - latest papers / code in ML, SoTA representation for several applications (CV, NLP, Medical ...)

Model ZOO

TIMM - Excellent model zoo & training scripts for pytorch
ONNX Model Zoo - Collection of pre-trained onnx models
Tensorflow Hub - pre-trained model that can be imported as keras layers for deployment / fine-tuning
Keras Applications - pre-trained popular CNNs implemented in Keras - can be customized and fine tuned
Torchvision - The torch equivalent to keras applications
Openvino pre-trained models - Intel pre-trained model for use in OpenVino

Generic DSE Framework

Google OR-Tools - Constraint programming, routing and other optimization tools
Facebook Botorch - Bayesian optimization accelerated by torch backend, python API
Pymoo - collection of multi-objective optimization implementation in python, user friendly interface

DNN conversion framework

MMdnn - Microsoft tool for cross-framework conversion, retraining, visualization & deployment
ONNX - model format to exchange frozen models between ML frameworks

Visualization Framework

Tensorboard - Visualization tool for Tensorflow, Pytorch ..., can show graph, metric evolution over training ... very adaptable
Netron - Tool to show ONNX graph with all the attributes.
mlflow - very flexible simulation logging tool (client/server) allowing to log parameter & metrics + object storage, python and shell interfaces

mukullokhande99/awesome-approximate-dnn

Approximate Computing in Deep Neural Networks

Lexical

Best Surveys

Tools

Approximations Frameworks

Dedicated Library

Graph Compiler

Commercial Dedicated HW accelerator (ASIC)

FPGA based accelerator / HLS for CNNs

Evaluation Frameworks

Simulation Frameworks

Approximation Methods

Multi-techniques

Pruning

Structured - Hardware Friendly Structure

Weight Saliency Determination

Data-free methods

Quantization

Approximate operators

Others

Contests

Model ZOO

Generic DSE Framework

DNN conversion framework

Visualization Framework

HLS Framework

Efficient DNN Architecture

Similar repos