How to initial the weights in the training of KD stage2?

Question

How to initial the weights in the training of KD stage2?

Closed this issue 6 years ago · 6 comments

How to fill the arguments when call the command?
./caffe-MLIC/build/tools/caffe train --solver=... --weights= --gpu 0

I try initializing the weights using the following argument. Is that right?
--weights=kd_stage1/modesl_iter_400000.caffemodel, pretrain/VGG_ILSVRC_16_layers.caffemodel

Answer 1 · 2018-12-10T04:17:01.000Z

--weights=WSDDN/****.caffemodel (layer names are modified), kd_stage1/modesl_iter_400000.caffemodel

Check last issue, make sure the layer names of T-WDet model have been modified.

Answer 2 · 2018-12-11T06:28:16.000Z

The size of kd_stage1/modesl_iter_400000.caffemodel is almost the twice of WSDDN/.caffemodel because it contains both detection and classify network. The detection network is exactly the same as WSDDN/.caffemodel, including the same layer name.
If reloading WSDDN/.caffemodel first, then the weight of detection network of kd_stage1/modesl_iter_400000.caffemodel will overwrite WSDDN/.caffemodel.

So the arguments should be shown as follwing which means reloading kd_stage1 first:
--weights=kd_stage1/modesl_iter_400000.caffemodel, WSDDN/****.caffemodel (layer names are modified)

Answer 3 · 2018-12-11T08:46:23.000Z

The initial loss:

The loss after 38000 iter with batchsize=3:

Answer 4 · 2018-12-12T07:46:14.000Z

Please check that the weight of T-WDet in stage1 is frozen. So, actually, the reloading result should be indifferent from the order of the arguments in --weights.

Answer 5 · 2018-12-12T13:46:40.000Z

Actually, the pretrian model that training of stage2 needs is only stage1 model.
Is that right?

Answer 6 · 2018-12-13T15:05:32.000Z

No, please check that there are two classifiers (fc_wsd_8_c and fc_wsd_8_d) in T-WDet in the prototxt of s2.