Hoping to give a clear view on the subject with curated contents organized
From algorithm to hardware execution
- Approximate Computing in Deep Neural Networks
Table of contents generated with markdown-toc
- PTQ: Post Training Quantization
- QAT: Quantization Aware Training
- 2022 Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey, Armeniakos & al.
- 2019 Deep Neural Network Approximation for Custom Hardware:Where We’ve Been, Where We’re Going, Wang & al.
- 2017 Efficient Processing of Deep Neural Networks: A Tutorial and Survey, Sze & al.
- 2019 Recent Advances in Convolutional Neural Network Acceleration, Qianru Zhang, & al.
- 2020 Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey, Deng & al.
- 2020 Approximation Computing Techniques to Accelerate CNN Based Image Processing Applications – A Survey in Hardware/Software Perspective, Manikandan & al.
- 2021 Pruning and Quantization for Deep Neural Network Acceleration: A Survey, Liang & al.
Name | Description | Framework | Supported Approx |
---|---|---|---|
NEMO | small library for minimization of DNNs intended for ultra low power devices like pulp-nn | PyTorch, ONNX | PTQ, QAT |
Microsoft NNI | lightweight toolkit for Feature Engineering, Neural Architecture Search, Hyperparameter Tuning and Model Compression | Pytorch, Tensorflow (+Keras), MXnet, Caffe2 CNTK, Theano | Pruning / PTQ) |
PocketFlow | open-source framework for compressing and accelerating DNNs. | Tensorflow | PTQ, QAT, Prunning |
Tensorflow Model Optimization | Toolkit to optimize ML / DNN model | Tenforflow(Keras) | Clustering, Quantization (PTQ, QAT), Pruning |
QKeras | quantization extension to Keras that provides drop-in replacement for some of the Keras layers | Tensorflow(Keras) | Quantization (QAT) |
Brevitas | Pytorch extension to quantize DNN model | Pytorch | PTQ, QAT |
TFApprox | Add ApproxConv layers to TF to emulate the use of approximated multipliers on GPU, typically from EvoApproxLib | Tensorflow | Approximate Multipliers |
N2D2 | Toolset to import or train model, apply quantization, and export in various format (C/C++ ...) | ONNX | QAT(license required), PTQ |
Distiller | Distiller is an open-source Python package for neural network compression research (fine-tuning capable) | Pytorch | Pruning, Quantization (QAT), Knowledge Distillation, Conditional Computation, Regularization |
Adapt | AdaPT is a fast emulation framework that extends PyTorch to support approximate inference as well as approximation-aware retraining | Pytorch | Approximate Multipliers |
Intel Neural Compressor | INC is an open-source Python lib for neural network compression | TensorFlow, PyTorch, ONNX Runtime, MXNet | Pruning (Magnitude, Grad), Quantization (PQT, dynamic, QAT, Mix precision), Knowledge Distillation |
Qualcomm AIMET | AIMET is an open-source lib for trained neural network quantization and compression + Model Zoo | TensorFlow, PyTorch | Pruning (Channel), Spatial SVD, per-layer compression ratio selection, Quantization (PQT, QAT, Simulation, Rounding, Bias correction, Cross layer equalization, Mix precision) |
OpenMMRazor | MMRazor is an open-source toolkit for model slimming and AutoML | OpenMM | Neural Architecture Search (NAS), Pruning, Knowledge Distillation (KD), Quantization (in the next release) |
- DORY - automatic tool to deploy DNNs on low-cost MCUs with typically less than 1MB of on-chip SRAM memory
- Glow - Glow is a machine learning compiler and execution engine for hardware accelerators (Pytorch, ONNX)
- TensorflowLite - TensorFlow Lite is a set of tools to help developers run TensorFlow models on mobile, embedded, and IoT devices. It enables on-device machine learning inference with low latency and a small binary size (linux, android, mcu). curated content for tflite
- OpenVino - OpenCL based graph compiler for intel environnment (Intel CPU, Intel GPU, Dedicated accelerator)
- N2D2 - Framework capable of training and exporting DNN in different format, particulary standalone C/C++ compilable project with very few dependencis and quantized, support import from ONNX model
- Vitis AI - Optimal Artificial Intelligence Inference from Edge to Cloud (compiler / optimizer / quantizer / profiler / IP set)
- OnnxRuntime Graph optim - Optimize onnx graph (simplification)
Name | Description | Environment | Perf |
---|---|---|---|
Esperanto ET-soc-1 | 1000+ low power risc v core chip energy efficient processing of ML/DNN | Cloud | 800 TOPS @ 20W |
Google TPU | Processing unit for DNN workload, efficient systolic array for computation | Cloud, Edge | V4 - 275 TFLOPS @ 200W / V3 - 90 TOPS @250W / Coral Edge 4TOPS @ 2W |
Greenwave GAP8 | multi-GOPS fully programmable RISC-V IoT-edge computing engine, featuring a 8-core cluster with CNN accelerator, coupled with an ultra-low power MCU with 30 μW state-retentive sleep power (75mW) | Edge | 600 GMAC/s/W |
Intel Movidius Myriad | Vector processing unit for accelerating DNN inference, Interface with the OpenVino toolkit, 16 programmable cores | Edge | 1 TOPS @ 1.5W - 2.67 TOPS/W |
Synaptic NPU VIP9000 | Nerural processing unit for accelerating DNN inference, 22 NN core (Conv) and 8 Tensor Core, support Bfloat16 | Edge | 6,75 TOPS @ ? W |
Sima ML accelerator MLSoC | SoC for accelerating DNN inference (PCIe/SPI/I2C...), support int8 | Edge/Cloud | 50 TOPS @ 5 W |
Moffett Antoum | SoC for accelerating SPARSE CV/LLM DNNs inference | Cloud | 29.5 TOPS / 3.7 TFLOPS @ 70 W |
IBM NorthPole | NPU for DNNs inference, Vector Matrix Multiplication (VMM) + 2xNoC, int 4,8,16 | Cloud | - |
- Maestro - open-source tool for modeling and evaluating the performance and energy-efficiency of different dataflows for DNNs
- HLS4ML - package for creating HLS from various ML framework (good pytorch support), create streamline architecture
- FINN - framework for creating HW accelerator (HLS code) from BREVITAS quantized model, downto BNN, create PE architecture
- N2D2 - framework for creating HLS from N2D2 trained model (support ONNX import), create streamline architecture
- ScaleHLS - HLS framework on MLIR. Can compile HLS C/C++ or ONNX model to optimized HLS C/C++ in order to generate high-efficiency RTL design using downstream tools, such as Vivado HLS. Focus on scalability, automated DSE engine.
- DNN-Neurosim - Framework for evaluating the performance of inference or training of on-chip DNN
- SCALE-Sim - ARM CNN accelerator simulator, that provides cycle-accurate timing, power/energy, memory bandwidth and trace results for a specified accelerator configuration and neural network architecture.
- Eyeriss Energy Estimator - Energy Estimator for MIT's Eyeriss Hardware Accelerator
- Torchbench - collection of deep learning benchmarks you can use to benchmark your models, optimized for the PyTorch framework.
- Renode - Functional simulation platform for MCU dev & test (single and multi-node)
-
2022 Cross-Layer Approximation for Printed Machine Learning Circuits (code), - Algorithmic and logic level approximation (coefficient replacement + netlist pruning) through a full DSE for printed ML applications.
-
2020 Deep Neural Network Compression by In-Parallel Pruning-Quantization - Use Bayesian optimization to solve both pruning and quantization problems jointly and with fine-tuning.
-
2020 OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization - Analytical single shot compression (Pruning + Quantization) of DNN using only pretrained weights values, then fine-tuning to recover ACL
- Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity - Large matrix multiplication are tiled, this method propose to maintain a regular pattern at the tile level, improving efficiency.
- 2020 Utilizing Explainable AI for Quantization and Pruning of Deep Neural Networks - Using DeepLift (explainable AI) as hints to improve compression by determining importance of neurons and features
- 2021 Post-training deep neural network pruning via layer-wise calibration - Layer-wise sparse pruning calibration based on the use of fractal images to replace representative data, post quantization, achieving 2x compression.
- 2018 Learning Compression from Limited Unlabeled Data - Use unlabelled data to improve accuracy of quantization in a very fast fine-tuning step
- 2020 Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors - AutoQKeras, Per layer quantization optimization using meta-heuristic DSE based on Bayesian Optimization, make use of Qkeras & hls4ml.
- 2020 Full Approximation of Deep Neural Networks through Efficient Optimization - Select efficient approx multipliers through retraining and minimization of accuracy loss (Evo Approx)
- 2019 ALWANN: Automatic Layer-Wise Approximation of Deep Neural Network Accelerators without Retraining - Use NSGA II to optimize approximate multipliers implemented & DNN mapping onto implemented Ax multipliers (Evo Approx).
- MLPerf / MLCommons - Acceleration contest for ML
- Papers with Code - latest papers / code in ML, SoTA representation for several applications (CV, NLP, Medical ...)
- TIMM - Excellent model zoo & training scripts for pytorch
- ONNX Model Zoo - Collection of pre-trained onnx models
- Tensorflow Hub - pre-trained model that can be imported as keras layers for deployment / fine-tuning
- Keras Applications - pre-trained popular CNNs implemented in Keras - can be customized and fine tuned
- Torchvision - The torch equivalent to keras applications
- Openvino pre-trained models - Intel pre-trained model for use in OpenVino
- Google OR-Tools - Constraint programming, routing and other optimization tools
- Facebook Botorch - Bayesian optimization accelerated by torch backend, python API
- Pymoo - collection of multi-objective optimization implementation in python, user friendly interface
- MMdnn - Microsoft tool for cross-framework conversion, retraining, visualization & deployment
- ONNX - model format to exchange frozen models between ML frameworks
- Tensorboard - Visualization tool for Tensorflow, Pytorch ..., can show graph, metric evolution over training ... very adaptable
- Netron - Tool to show ONNX graph with all the attributes.
- mlflow - very flexible simulation logging tool (client/server) allowing to log parameter & metrics + object storage, python and shell interfaces
- Xilinx Vivado HLS - C/C++ based HLS for XILINX Fpga
- ntel Quartus HLS - C++ HLS for ALTERA/INTEL FPGA
- Mentor Catapult HLS - C++/SystemC HLS For Siemens FPGA
- Blog post - related to recent mobile architectures
- https://github.com/juliagusak/model-compression-and-acceleration-progress
- https://github.com/ZhishengWang/Embedded-Neural-Network
- https://github.com/memoiry/Awesome-model-compression-and-acceleration
- https://github.com/sun254/awesome-model-compression-and-acceleration
- https://github.com/guan-yuan/awesome-AutoML-and-Lightweight-Models
- https://github.com/chester256/Model-Compression-Papers
- https://github.com/mapleam/model-compression-and-acceleration-4-DNN
- https://github.com/cedrickchee/awesome-ml-model-compression
- https://github.com/jnjaby/Model-Compression-Acceleration
- https://github.com/he-y/Awesome-Pruning