Paper reading Jul 2019 [5]

Question

Paper reading Jul 2019 [5]

Opened this issue 5 years ago · 0 comments

In July I read a few papers on network compression and architecture search. I have also run some INQ experiments on CIFAR-100 and was able to achieve near lossless results. Code

Pruning

Prune weights < threshold

Lottery ticket hypothesis

A subset of weights initialization can reach same validation loss
Initialization rather than the structure matters (if reset to random weights doesn't train)
Method: iterative pruning: train -> prune -> reset

Uber's follow up

zero matters: weights that are pruned away in the end are training towards zero; so pruning becomes training
sign matters: keep same sign as initialization sign, able to achieve similar results (i.e. random weights but with similar sign)
better mask: keep "large final, same sign": weights that have large magnitude in the end also same sign as initialization
- Lottery ticket mask:
- better mask:

Incremental Network Quantization

Quantize weights to power of 2, hence can be represented by less bits, and manipulation is easier (multiplication becomes bit shift)
For each iteration, quantize the large weights (i.e. partition based on Han's pruning), then retrain the rest for x epochs; x is usually small (<=8)
Demonstrate can achieve lossless compression, sometimes even better test results

AutoAugment

Use RL to find the best augmentation policy
Search space = augmentation techniques, eg. horizontal_flip, rotate, etc.
Reward = validation loss
RL = policy gradient, specifically PPO, as reward is not differentiable
SOTA but super expensive