Paper reading Jul 2019 [5]
Opened this issue · 0 comments
xysun commented
In July I read a few papers on network compression and architecture search. I have also run some INQ experiments on CIFAR-100 and was able to achieve near lossless results. Code
Pruning
- Prune weights < threshold
Lottery ticket hypothesis
- A subset of weights initialization can reach same validation loss
- Initialization rather than the structure matters (if reset to random weights doesn't train)
- Method: iterative pruning: train -> prune -> reset
Uber's follow up
- zero matters: weights that are pruned away in the end are training towards zero; so pruning becomes training
- sign matters: keep same sign as initialization sign, able to achieve similar results (i.e. random weights but with similar sign)
- better mask: keep "large final, same sign": weights that have large magnitude in the end also same sign as initialization
Incremental Network Quantization
- Quantize weights to power of 2, hence can be represented by less bits, and manipulation is easier (multiplication becomes bit shift)
- For each iteration, quantize the large weights (i.e. partition based on Han's pruning), then retrain the rest for x epochs; x is usually small (<=8)
- Demonstrate can achieve lossless compression, sometimes even better test results
AutoAugment
- Use RL to find the best augmentation policy
- Search space = augmentation techniques, eg. horizontal_flip, rotate, etc.
- Reward = validation loss
- RL = policy gradient, specifically PPO, as reward is not differentiable
- SOTA but super expensive