NetworkCompress

Inspired by net2net, network distillation.

Contributor: @luzai, @miibotree

[TOC]

Environment

@luzai

single model may be trained multiple time, have a nice event logger(csv or tfevents)
logger for model mutations and training event
Dataset switcher (mnist, cifar100 or others)
write doc
Name
Vis

@miibotree

Grow Architecture to VGG-like
- Exp: what accuracy can vgg-19 achieve
- Fixed: slight downgrade of net2wider conv8
- compare on accuracy and training time

Vgg16	Vgg8	Vgg8+Dropout	Vgg8-net2net(no dropout)
10.00%	83.56%	90.05%	87.45%

Figure 1 Vgg8-net2net(no dropout, epoch 0-250)

Figure 2 Vgg8-net2net(no dropout, epoch 20-250)

Figure 3 Vgg-net2net(cmd1, in different stage)

kd loss -[x] soft-target
transfer data
experiments on random generate model
- generate random feasible command
- check the completeness, run code in parallel
- find some rules: Gradient explosion happens when fc is too deep
Data-augmentation is better than Dropout