AI-ON.org open research project.
Date: October 2016 - Category: Fundamental Research - Mailing list
Due to their structure, it may be that residual modules [1] could be trained incrementally, starting from a previous, shallower net learned with full supervision. At each step, the network would learn an additional residual module, which would be an additional non-linear feature representation of the input that is fed into the previous module — the classifier. A very useful reading to help intuition of this effect is [2], which gives an ensemble-like interpretation of residual networks. The re-use of the previously trained layers should save computational time. Moreover, it is possible to show that at each step we are learning in a strictly larger model space, of which network learned in the previous step is the optimal model when we zero-out the weights of the new residual units just added.
Some approaches for incremental learning have been recently investigated [3, 4]. They share some intuition with this one. Although they try to solve the more general problem of transfer learning and they are not tailored to residual networks specifically.
Efficient layer-wise training of deep networks could allow to significantly speed up training of large models. It is one of the long-standing "dreams" of deep learning, but has proven elusive so far. If such a method were to be devised and performed competitively with end-to-end trained models while providing computational benefits, it would quickly be adopted across the entire field.
CIFAR10 - small scale classification.
MS COCO - smaller scale classification, detection, segmentation.
ImageNet - large-scale classification.
OpenImages - large-scale classification.
Others ?
1: Deep Residual Learning for Image Recognition
2: Residual Networks are Exponential Ensembles of Relatively Shallow Networks
3: Net2Net: Accelerating Learning via Knowledge Transfer
4: Progressive Neural Networks