Layer-wise supervised incremental training of residual networks

AI-ON.org open research project.

Date: October 2016 - Category: Fundamental Research - Mailing list

Problem description

Due to their structure, it may be that residual modules [1] could be trained incrementally, starting from a previous, shallower net learned with full supervision. At each step, the network would learn an additional residual module, which would be an additional non-linear feature representation of the input that is fed into the previous module — the classifier. A very useful reading to help intuition of this effect is [2], which gives an ensemble-like interpretation of residual networks. The re-use of the previously trained layers should save computational time. Moreover, it is possible to show that at each step we are learning in a strictly larger model space, of which network learned in the previous step is the optimal model when we zero-out the weights of the new residual units just added.

Some approaches for incremental learning have been recently investigated [3, 4]. They share some intuition with this one. Although they try to solve the more general problem of transfer learning and they are not tailored to residual networks specifically.

Why this problem matters

Efficient layer-wise training of deep networks could allow to significantly speed up training of large models. It is one of the long-standing "dreams" of deep learning, but has proven elusive so far. If such a method were to be devised and performed competitively with end-to-end trained models while providing computational benefits, it would quickly be adopted across the entire field.

Datasets (examples)

CIFAR10 - small scale classification.
MS COCO - smaller scale classification, detection, segmentation.
ImageNet - large-scale classification.
OpenImages - large-scale classification.
Others ?

References

1: Deep Residual Learning for Image Recognition
2: Residual Networks are Exponential Ensembles of Relatively Shallow Networks
3: Net2Net: Accelerating Learning via Knowledge Transfer
4: Progressive Neural Networks

jessicayung/incremental-resnet

Layer-wise supervised incremental training of residual networks

Problem description

Why this problem matters

Datasets (examples)

References