xysun/blog

Paper reading Jul 2019 [5]

Opened this issue · 0 comments

xysun commented

In July I read a few papers on network compression and architecture search. I have also run some INQ experiments on CIFAR-100 and was able to achieve near lossless results. Code

Pruning

  • Prune weights < threshold

Lottery ticket hypothesis

  • A subset of weights initialization can reach same validation loss
  • Initialization rather than the structure matters (if reset to random weights doesn't train)
  • Method: iterative pruning: train -> prune -> reset

Uber's follow up

  • zero matters: weights that are pruned away in the end are training towards zero; so pruning becomes training
  • sign matters: keep same sign as initialization sign, able to achieve similar results (i.e. random weights but with similar sign)
  • better mask: keep "large final, same sign": weights that have large magnitude in the end also same sign as initialization
    • Lottery ticket mask: image
    • better mask: image

Incremental Network Quantization

  • Quantize weights to power of 2, hence can be represented by less bits, and manipulation is easier (multiplication becomes bit shift)
  • For each iteration, quantize the large weights (i.e. partition based on Han's pruning), then retrain the rest for x epochs; x is usually small (<=8)
  • Demonstrate can achieve lossless compression, sometimes even better test results

AutoAugment

  • Use RL to find the best augmentation policy
  • Search space = augmentation techniques, eg. horizontal_flip, rotate, etc.
  • Reward = validation loss
  • RL = policy gradient, specifically PPO, as reward is not differentiable
  • SOTA but super expensive