Awesome-model-compression-and-acceleration

Some papers I collected and deemed to be great to read, which is also what I'm about to read, raise a PR or issue if you have any suggestion regarding the list, Thank you.

Survey

  1. A Survey of Model Compression and Acceleration for Deep Neural Networks [arXiv '17]
  2. Recent Advances in Efficient Computation of Deep Convolutional Neural Networks [arXiv '18]
  3. Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Model and structure

  1. MobilenetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation [arXiv '18, Google]
  2. NasNet: Learning Transferable Architectures for Scalable Image Recognition [arXiv '17, Google]
  3. DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices [AAAI'18, Samsung]
  4. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [arXiv '17, Megvii]
  5. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [arXiv '17, Google]
  6. CondenseNet: An Efficient DenseNet using Learned Group Convolutions [arXiv '17]
  7. Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video[arxiv'17]
  8. Shift-based Primitives for Efficient Convolutional Neural Networks [WACV'18]

Quantization

  1. The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning [ICML'17]
  2. Compressing Deep Convolutional Networks using Vector Quantization [arXiv'14]
  3. Quantized Convolutional Neural Networks for Mobile Devices [CVPR '16]
  4. Fixed-Point Performance Analysis of Recurrent Neural Networks [ICASSP'16]
  5. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations [arXiv'16]
  6. Loss-aware Binarization of Deep Networks [ICLR'17]
  7. Towards the Limit of Network Quantization [ICLR'17]
  8. Deep Learning with Low Precision by Half-wave Gaussian Quantization [CVPR'17]
  9. ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks [arXiv'17]
  10. Training and Inference with Integers in Deep Neural Networks [ICLR'18]
  11. Deep Learning with Limited Numerical Precision[ICML'2015]

Pruning

  1. Learning both Weights and Connections for Efficient Neural Networks [NIPS'15]
  2. Pruning Filters for Efficient ConvNets [ICLR'17]
  3. Pruning Convolutional Neural Networks for Resource Efficient Inference [ICLR'17]
  4. Soft Weight-Sharing for Neural Network Compression [ICLR'17]
  5. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [ICLR'16]
  6. Dynamic Network Surgery for Efficient DNNs [NIPS'16]
  7. Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning [CVPR'17]
  8. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [ICCV'17]
  9. To prune, or not to prune: exploring the efficacy of pruning for model compression [ICLR'18]
  10. Data-Driven Sparse Structure Selection for Deep Neural Networks
  11. Learning Structured Sparsity in Deep Neural Networks
  12. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism
  13. Learning to Prune: Exploring the Frontier of Fast and Accurate Parsing
  14. Channel pruning for accelerating very deep neural networks [ICCV'17]
  15. Amc: Automl for model compression and acceleration on mobile devices [ECCV'18]
  16. RePr: Improved Training of Convolutional Filters [arXiv'18]

Binarized neural network

  1. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
  2. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
  3. Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

Low-rank Approximation

  1. Efficient and Accurate Approximations of Nonlinear Convolutional Networks [CVPR'15]
  2. Accelerating Very Deep Convolutional Networks for Classification and Detection (Extended version of above one)
  3. Convolutional neural networks with low-rank regularization [arXiv'15]
  4. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation [NIPS'14]
  5. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [ICLR'16]
  6. High performance ultra-low-precision convolutions on mobile devices [NIPS'17]
  7. Speeding up convolutional neural networks with low rank expansions

Distilling

  1. Dark knowledge
  2. FitNets: Hints for Thin Deep Nets
  3. Net2net: Accelerating learning via knowledge transfer
  4. Distilling the Knowledge in a Neural Network
  5. MobileID: Face Model Compression by Distilling Knowledge from Neurons
  6. DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
  7. Deep Model Compression: Distilling Knowledge from Noisy Teachers
  8. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
  9. Sequence-Level Knowledge Distillation
  10. Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
  11. Learning Efficient Object Detection Models with Knowledge Distillation
  12. Data-Free Knowledge Distillation For Deep Neural Networks
  13. Learning Loss for Knowledge Distillation with Conditional Adversarial Networks
  14. Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks
  15. Moonshine: Distilling with Cheap Convolutions
  16. Model Distillation with Knowledge Transfer from Face Classification to Alignment and Verification

System

  1. DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications [MobiSys '17]=
  2. DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware [MobiSys '17]
  3. MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU [EMDL '17]
  4. DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile devices [WearSys '16]
  5. DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices [IPSN '16]
  6. EIE: Efficient Inference Engine on Compressed Deep Neural Network [ISCA '16]
  7. MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints [MobiSys '16]
  8. DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit [MobiCASE '16]
  9. Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables [SenSys ’16]
  10. An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices [IoT-App ’15]
  11. CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android [MM '16]
  12. fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs [NIPS '17]

Some optimization techniques

  1. 消灭重复计算
  2. 展开循环
  3. 利用SIMD指令
  4. OpenMP
  5. 定点化
  6. 避免非连续内存读写

References