/awesome-deep-computation

A curated list of Deep Learning hardware, cycle/memory optimisation techniques

Awesome Deep Computation Awesome

A curated list of awesome deep learning hardware, compute cycle/memory optimisation and implementation techniques. Inspired by awesome-deep-learning. Literature from 2014 onwards.

Transistor/Gate Level Hardware

  1. 2016/05 A 2.2 GHz SRAM with High Temperature Variation Immunity for Deep Learning Application under 28nm
  2. 2016/06 Switched by Input: Power Efficient Structure for RRAM-based Convolutional Neural Network
  3. 2016/06 Low-power approximate convolution computing unit with domain-wall motion based "spin-memristor" for image processing applications

Low Level Hardware Architecture

  1. 2014/06 A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks
  2. 2015/02 Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
  3. 2015/06 Neuromorphic Architectures for Spiking Deep Neural Networks
  4. 2015/06 Memory and information processing in neuromorphic systems
  5. 2015/08 INsight: A Neuromorphic Computing System for Evaluation of Large Neural Networks
  6. 2016/02 Deep Learning on FPGAs: Past, Present, and Future.
  7. 2016/02 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems
  8. 2016/02 vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design
  9. 2016/04 Demonstrating Hybrid Learning in a Flexible Neuromorphic Hardware System
  10. 2016/04 Hardware-oriented Approximation of Convolutional Neural Networks
  11. 2016/04 Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices
  12. 2016/05 ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars
  13. 2016/07 Maximizing CNN Accelerator Efficiency Through Resource Partitioning
  14. 2016/07 Overcoming Resource Underutilization in Spatial CNN Accelerators
  15. 2016/07 Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks

Model Implementation Techniques

  1. 2014/12 Training Deep Neural Neworks with Low Precision Multiplications
  2. 2014/12 Implementation of Deep Convolutional Neural Net on a Digital Signal Processor
  3. 2015/02 Deep Learning with Limited Numerical Precision
  4. 2015/02 Faster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training
  5. 2015/02 8-Bit Approximations for Parallelism in Deep Learning
  6. 2016/01 DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
  7. 2016/02 Neural Networks with Few Multiplications
  8. 2016/02 Deep Compression: Compressing Deep Neural Networks with Pruning, Quantization and Huffman Coding
  9. 2016/02 8-Bit Approximations for Parallelism in Deep Learning

Tutorials and talks

  1. 2015/09 Heterogeneous Computing in HPC and Deep Learning
  2. 2016/02 Going Deeper with Embedded FPGA Platform for Convolutional Neural Network
  3. 2016/02 Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks
  4. 2016/05 Deep Compression: Compressing Deep Neural Networks with Pruning, Quantization and Huffman Coding
  5. 2016/05 DNNWEAVER: From High-Level Deep Network Models to FPGA Acceleration

Thesis

  1. 2015/08 FPGA based Multi-core architectures for Deep Learning
  2. 2016/05 Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks

Whitepapers

  1. 2015/02 Accelerating Deep Convolutional Neural Networks Using Specialized Hardware
  2. 2015/07 Efficient Implementation of Neural Network Systems Built on FPGAs, Programmed with OpenCL

Blogs and Articles

  1. 2015/05 Numerical Optimization for Deep Learning
  2. 2015/10 Single Node Caffe Scoring and Training on Intel® Xeon E5-Series Processors
  3. 2016/03 FPGAs Challenge GPUs as a Platform for Deep Learning
  4. 2016/03 FPGA with OpenCL Solution Released to Deep Learning
  5. 2016/04 Boosting Deep Learning with the Intel Scalable System Framework
  6. 2016/04 Movidius puts deep learning chip in a USB drive
  7. 2016/05 The PCM-Neuron and Neural Computing
  8. 2016/05 FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency

Hardware platforms & accelerators

  1. Nvidia Devbox
  2. Google Tensor Processing Unit
  3. Facebook Open Rack V2 compatible 8-GPU server
  4. CEVA DNN Digital Signal Processor
  5. Movidius Fathom USB Stick
  6. IBM TrueNorth
  7. AMAX SenseBox

Licenses

License

CC0