- Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization
- Data-Free Quantization through Weight Equalization and Bias Correction
- Fighting Quantization Bias With Bias
- Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
- Cell Division: Weight Bit-Width Reduction Technique for Convolutional Neural Network Hardware Accelerators
- Figting bias with bias
-
PACT: Parameterized Clipping Activation for Quantized Neural Networks
- Relu-α where α is a learnable parameter per layer with L2 regularization
-
Accurate and Efficient 2-bit Quantized Neural Networks
- PACT quantization aware training
- Statistic-Aware Weight Binning
- Full precision for shortcut connections in resnet
-
Fully Quantized Network for Object Detection
- 4bits quantization
- Freeze BN statistic
- clamped activation from calibration statistic
- channel-wise quantization scale
- DISCOVERING LOW-PRECISION NETWORKS CLOSE TO FULL-PRECISION NETWORKS FOR EFFICIENT EMBEDDED INFERENCE
- Learned Step Size Quantization
- (https://arxiv.org/pdf/1810.05723.pdf)
- Accurate and Efficient 2-bit Quantized Neural Networks
- Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network using Truncated Gaussian Approximation
- Quantization Networks
- Learning low-precision neural networks without Straight-Through Estimator (STE)
- LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
- Quantization Networks
- SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization
- Value-aware Quantization for Training and Inference of Neural Networks
- Efficient and Effective Quantization for Sparse DNNs
- BiScaled-DNN: Quantizing Long-tailed Datastructures with Two Scale Factors for Deep Neural Networks
- TensorRT
- Per channel weight scale
- Calibration: minimize KL Divergence
- Tensorflow lite
- Per channel weight scale
- Calibration: min max??
- TVM
- Per channel weight scale
- Calibration: minimize MSE