강의 주제: TinyML and Efficient Deep Learning Computing
Instructor : Song Han(Associate Professor, MIT EECS)
[schedule(2023 Fall)] | [schedule(2022 Fall)] | [youtube]
-
효율적인 추론 방법 공부
딥러닝 연산에 있어서 효율성을 높일 수 있는 알고리즘을 공부한다.
-
제한된 성능에서의 딥러닝 모델 구성
디바이스의 제약에 맞춘 효율적인 딥러닝 모델을 구성한다.
-
latency, storage, energy
Memory-Related(#parameters, model size, #activations), Computation(MACs, FLOP)
-
Pruning Granularity, Pruning Critertion
unstructured/structured pruning
magnitude-based pruning(L1-norm), second-order-based pruning, percentage-of-zero-based pruning, regression-based pruning
-
Automatic Pruning, Lottery Ticket Hypothesis
Pruning Ratio, Sensitivity Analysis, Automatic Pruning(AMC, NetAdapt)
Lottery Ticket Hypothesis(Winning Ticket, Iterative Magnitude Pruning, Scaling Limitation), Pruning with Regularization
Pruning at Initialization(Connection Sensitivity)
-
System & Hardware Support for Sparsity
EIE(CSC format: relative index, column pointer)
M:N Sparsity
-
Basic Concepts of Quantization
Numeric Data Types: Integer, Fixed-Point, Floating-Point(IEEE FP32/FP16, BF16, NVIDIA FP8), INT4 and FP4
Uniform vs Non-uniform quantization, Symmetric vs Asymmetric quantization
-
Vector Quantization, Linear Quantization
Vector Quantization(VQ): Deep Compression(iterative pruning, retrain codebook, Huffman encoding), Product Quantization(PQ): AND THE BIT GOES DOWN
Linear Quantization: Zero point, Scaling Factor, Quantization Error(clip error, round error), Linear Quantized Matrix Multiplization(FC layer, Conv layer)
-
Weight Quantiztion: Per-Tensor Activation Per-Channel Activation, Group Quantization(Per-Vector, MX), Weight Equalization, Adative Rounding
Activation Quantization: During training(EMA), Calibration(Min-Max, KL-divergence, Mean Squared Error)
Bias Correction, Zero-Shot Quantization(ZeroQ)
-
Quantization-Aware Training, Low bit-width quantization
Fake quantization, Straight-Through Estimator
Binary Quantization(Deterministic, Stochastic, XNOR-Net), Ternary Quantization
-
Neural Architecture Search: basic concepts & manually-designed neural networks
input stem, stage, head
AlexNet, VGGNet, SqueezeNet(global average pooling, fire module, pointwise convolution), ResNet50(bottleneck block, residual learning), ResNeXt(grouped convolution)
MobileNet(depthwise-separable convolution, width/resolution multiplier), MobileNetV2(inverted bottleneck block), ShuffleNet(channel shuffle), SENet(squeeze-and-excitation block), MobileNetV3(redesigning expensive layers, h-swish)
-
Neural Architecture Search: RNN controller & search strategy
cell-level search space, network-level search space
design the search space: Cumulative Error Distribution, FLOPs distribution
Search Strategy: grid search, random search, reinforcement learning, bayesian optimization, gradient-based search, evolutionary search
EfficientNet(compound scaling), DARTS
-
Neural Architecture Search: Performance Estimation & Hardware-Aware NAS
Weight Inheritance, HyperNetwork, Weight Sharing(super-network, sub-network)
Performance Estimation Heuristics: Zen-NAS, GradSign
Hardware-Aware NAS(ProxylessNAS, HAT), One-Shot NAS(Once-for-All)
-
Knowledge Distillation(distillation loss, temperature)
KD: matching intermediate weights/features/attention maps/sparsity pattern/relational information(layers, samples)
-
Self Distillation, Online Distlliation, Applications
Self Distillation, Online Distillation, Combining Online and Self-Distillation, Network Augmentation
Applications: Object Detection, Semantic Segmentation, GAN, NLP
-
microcontroller, flash/SRAM usage, peak SRAM usage, MCUNet: TinyNAS, TinyEngine
TinyNAS: automated search space optimization(weight/resolution multiplier), resource-constrained model specialization(Once-for-All)
MCUNetV2: patch-based inference, network redistribution, joint automated search for optimization, MCUNetV2 architecture(VWW dataset inference)
RNNPool, MicroNets(MOPs & latency/energy consumption relationship)
-
memory hierarchy of MCU, data layout(NCHW, NHWC, CHWN)
TinyEngine: Loop Unrolling, Loop Reordering, Loop Tiling, SIMD programming, Im2col, In-place depthwise convolution, appropriate data layout(pointwise, depthwise convolution), Winograd convolution
[ slides ]
(Part I)
(Part II)
(Part III)
Design for Microcontrollers
Lecture 12: Paper Reading Presentation
Lecture 24: Final Project Presentation
Lecture 25: Final Project Presentation