multilabel configuration
sundevil0405 opened this issue · 17 comments
Hi,
We are trying to learn to use cxxnet for a multi-label problem.
We made the following settings:
label_width = 5
label_vec[0,5) = class
target = class
metric = error
but get the error:
Metric: unknown target = label
Could any one kindly explain this for us or provide us an example of multi-label layer configuration ?
Thanks a lot,
YS
modify
label_vec[0,5) = class
target = class
to
label_vec[0,5) = label
target = label
see this ...
#139
Thank you sxjzwq!
I followed your comments and it works. However, I met another error:
Segmentation fault (core dumped)
Is there anyway to fix this?
Thanks a lot!
I guess it is caused by the input data. What's the size of your input ? For example, If it's 224_224_3, and if the size of some images in your data are smaller than 224, you will meet such problem.
You should resize your image when applying im2rec. Check the im2rec help and you will find those parameters.
Hi sxjzwq,
Our input is 512x512x3 and we actually have resized the images before running the code. Could you tell me how to check if some image does not have a right shape? Or is there any other possible reason?Thank you!
I am not sure. May be you should check the format of your image list and re-generate the .rec file using the parameter resize=512. I only met this error when I include a subset of my data. After I check the subset I found some image size is smaller than my network input shape. So I resize them and the error gone. But there might be some other reasons in your case. Please check the input carefully. Good luck!
We will carefully check the input. Thanks a million!
You're welcome! Please let me know your multi-label classification performance if it works. I am also working on training a multi-label classification network but it seems that my network parameter can not converge.
Sure! We are trying some simple settings and see what happens. We will let you know the performance if the setting works!! Thank you!
Hi Qi,
We tried multiple parameter settings. It seems the code does work on our data as well. The training error does not even change after multiple rounds, we basically observe things like
round 0:[ 1098] 1082 sec elapsed[1] train-error:0.305704
round 1:[ 1098] 2170 sec elapsed[2] train-error:0.305203
round 2:[ 1098] 3259 sec elapsed[3] train-error:0.305203
round 3:[ 1098] 4347 sec elapsed[4] train-error:0.305203
round 4:[ 1098] 5436 sec elapsed[5] train-error:0.305203
round 5:[ 1098] 6524 sec elapsed[6] train-error:0.305203
round 6:[ 1098] 7612 sec elapsed[7] train-error:0.305203
round 7:[ 1098] 8700 sec elapsed[8] train-error:0.305203
round 8:[ 1098] 9789 sec elapsed[9] train-error:0.305203
round 9:[ 1098] 10878 sec elapsed[10] train-error:0.305203
round 10:[ 1098] 11966 sec elapsed[11] train-error:0.305203
round 11:[ 1098] 13054 sec elapsed[12] train-error:0.305203
round 12:[ 1098] 14142 sec elapsed[13] train-error:0.305203
round 13:[ 1098] 15231 sec elapsed[14] train-error:0.305203
round 14:[ 1098] 16319 sec elapsed[15] train-error:0.305203
round 15:[ 1098] 17408 sec elapsed[16] train-error:0.305203
round 16:[ 1098] 18496 sec elapsed[17] train-error:0.305203
I think it would be good to have an example in cxxnet.
Hi
May be you can set metric = logloss and try again. And which lose function are you using? Try muti_logistic. I got some positive results on a easy dataset now and my network is fine tuned based on vggnet16. But still trying on my real data.
Hi sxjzwq, thank you so much for your suggestion. We tried both l2 and softmax as the loss function. We will definitely try your suggestion and let you know if there is an improvement. Thanks again!
start from vgg16.model
layer:fc7
wmat:eta = 0.0005
bias:eta = 0.0010
layer:fc8
wmat:eta = 0.0010
bias:eta = 0.0020
round 0:[ 2466] 11686 sec elapsed[1] train-logloss:0.092616 train-rmse:6.30993
round 1:[ 2466] 23366 sec elapsed[2] train-logloss:-nan train-rmse:5.79034
round 2:[ 2466] 35045 sec elapsed[3] train-logloss:-nan train-rmse:5.65325
round 3:[ 2466] 46721 sec elapsed[4] train-logloss:-nan train-rmse:5.56152
round 4:[ 2466] 58397 sec elapsed[5] train-logloss:-nan train-rmse:5.48876
round 5:[ 2466] 70074 sec elapsed[6] train-logloss:-nan train-rmse:5.42933
start from 0006.model
layer:fc7
wmat:eta = 0.0005
bias:eta = 0.0010
layer:fc8
wmat:eta = 0.0005
bias:eta = 0.0010
round 6:[ 2466] 11681 sec elapsed[7] train-logloss:-nan train-rmse:5.33734
round 7:[ 2466] 23361 sec elapsed[8] train-logloss:-nan train-rmse:5.27811
round 8:[ 2466] 35040 sec elapsed[9] train-logloss:-nan train-rmse:5.2354
round 9:[ 2466] 46719 sec elapsed[10] train-logloss:-nan train-rmse:5.19465
round 10:[ 2466] 58396 sec elapsed[11] train-logloss:-nan train-rmse:5.15824
round 11:[ 2466] 70071 sec elapsed[12] train-logloss:-nan train-rmse:5.12289
start from 0012.model
layer:fc7
wmat:eta = 0.00025
bias:eta = 0.00050
layer:fc8
wmat:eta = 0.00025
bias:eta = 0.00050
round 12:[ 2466] 11686 sec elapsed[13] train-logloss:-nan train-rmse:4.60376
round 13:[ 2466] 23383 sec elapsed[14] train-logloss:-nan train-rmse:4.48242
round 14:[ 2466] 35060 sec elapsed[15] train-logloss:-nan train-rmse:4.4032
round 15:[ 2466] 46732 sec elapsed[16] train-logloss:-nan train-rmse:4.33162
round 16:[ 2466] 58405 sec elapsed[17] train-logloss:-nan train-rmse:4.28349
round 17:[ 2466] 70076 sec elapsed[18] train-logloss:-nan train-rmse:4.2459
start from 0018.model
layer:fc7
wmat:eta = 0.00010
bias:eta = 0.00020
layer:fc8
wmat:eta = 0.00010
bias:eta = 0.00020
round 18:[ 2466] 11674 sec elapsed[19] train-logloss:-nan train-rmse:3.93583
Using RMSE metric will be helpful.
Hi Qi,
Thank you very much for your advice. We will try this. By the way, we tried your last suggestion but we also met the NAN problem. Hopefully it will work this time. Thanks again!!
Hi Yashu
Yes, I don't know how to avoid the NAN problem when using logloss evaluation metric, but the RMSE metric seems works fine. I finally got the train-rmse 1.32312 on my data. And my multi-label classification mAP is bigger than 0.7, much better than using fc7-feature+multi_label_SVM.
I wish this information is helpful.
Best
Hi Qi,
That's a really good news! We actually followed your suggestion and
changed the RMSE metric. However, the speed seems extreme slow.. We've
pre-trained the network for ~3 days using two GTX titan black cards while
it only finishes ~300 rounds. How many rounds did your algorithm take? Is
that pre-train or fine-tuning?
Thank you very much,
Yashu
On Sun, Jul 12, 2015 at 10:22 PM, Qi Wu notifications@github.com wrote:
Hi Yashu
Yes, I don't know how to avoid the NAN problem when using logloss
evaluation metric, but the RMSE metric seems works fine. I finally got the
train-rmse 1.32312 on my data. And my multi-label classification mAP is
bigger than 0.7, much better than using fc7-feature+multi_label_SVM.I wish this information is helpful.
Best
—
Reply to this email directly or view it on GitHub
#194 (comment).
- Yashu
Hi Yashu
I am using the pre-trained VGGNet16 (trained on ImageNet of course) as the initial model. And then fine tune the last FC layer (fc7) and the classification layer (change 1000 to 256, which is my label width). Also, I change the loss layer from softmax to multi_logistic. For all the other layers, I keep learning rate as 0, so the parameters will be fixed as the VGGNet.
I start my training with the learning rate = 0.001 and decrease it when the train-RMSE error doesn't decrease any more. I only trained 36 rounds and because my learning rate has become 0.000001, I stopped the training. The following is my training log:
start from vgg16.model
layer:fc7
wmat:eta = 0.0005
bias:eta = 0.0010
layer:fc8
wmat:eta = 0.0010
bias:eta = 0.0020
round 0:[ 2466] 11686 sec elapsed[1] train-logloss:0.092616 train-rmse:6.30993
round 1:[ 2466] 23366 sec elapsed[2] train-logloss:-nan train-rmse:5.79034
round 2:[ 2466] 35045 sec elapsed[3] train-logloss:-nan train-rmse:5.65325
round 3:[ 2466] 46721 sec elapsed[4] train-logloss:-nan train-rmse:5.56152
round 4:[ 2466] 58397 sec elapsed[5] train-logloss:-nan train-rmse:5.48876
round 5:[ 2466] 70074 sec elapsed[6] train-logloss:-nan train-rmse:5.42933
start from 0006.model
layer:fc7
wmat:eta = 0.0005
bias:eta = 0.0010
layer:fc8
wmat:eta = 0.0005
bias:eta = 0.0010
round 6:[ 2466] 11681 sec elapsed[7] train-logloss:-nan train-rmse:5.33734
round 7:[ 2466] 23361 sec elapsed[8] train-logloss:-nan train-rmse:5.27811
round 8:[ 2466] 35040 sec elapsed[9] train-logloss:-nan train-rmse:5.2354
round 9:[ 2466] 46719 sec elapsed[10] train-logloss:-nan train-rmse:5.19465
round 10:[ 2466] 58396 sec elapsed[11] train-logloss:-nan train-rmse:5.15824
round 11:[ 2466] 70071 sec elapsed[12] train-logloss:-nan train-rmse:5.12289
start from 0012.model
layer:fc7
wmat:eta = 0.00025
bias:eta = 0.00050
layer:fc8
wmat:eta = 0.00025
bias:eta = 0.00050
round 12:[ 2466] 11686 sec elapsed[13] train-logloss:-nan train-rmse:4.60376
round 13:[ 2466] 23383 sec elapsed[14] train-logloss:-nan train-rmse:4.48242
round 14:[ 2466] 35060 sec elapsed[15] train-logloss:-nan train-rmse:4.4032
round 15:[ 2466] 46732 sec elapsed[16] train-logloss:-nan train-rmse:4.33162
round 16:[ 2466] 58405 sec elapsed[17] train-logloss:-nan train-rmse:4.28349
round 17:[ 2466] 70076 sec elapsed[18] train-logloss:-nan train-rmse:4.2459
start from 0018.model
layer:fc7
wmat:eta = 0.00010
bias:eta = 0.00020
layer:fc8
wmat:eta = 0.00010
bias:eta = 0.00020
round 18:[ 2466] 11674 sec elapsed[19] train-logloss:-nan train-rmse:3.93583
round 19:[ 2466] 23353 sec elapsed[20] train-logloss:-nan train-rmse:3.68861
round 20:[ 2466] 35027 sec elapsed[21] train-logloss:-nan train-rmse:3.48819
round 21:[ 2466] 46701 sec elapsed[22] train-logloss:-nan train-rmse:3.29444
round 22:[ 2466] 58375 sec elapsed[23] train-logloss:-nan train-rmse:3.13445
round 23:[ 2466] 70048 sec elapsed[24] train-logloss:-nan train-rmse:2.98958
start from 0024.model
layer:fc7
wmat:eta = 0.00001
bias:eta = 0.00002
layer:fc8
wmat:eta = 0.00001
bias:eta = 0.00002
round 24:[ 2466] 11671 sec elapsed[25] train-logloss:-nan train-rmse:3.27728
round 25:[ 2466] 23347 sec elapsed[26] train-logloss:-nan train-rmse:2.95055
round 26:[ 2466] 35017 sec elapsed[27] train-logloss:-nan train-rmse:2.65933
round 27:[ 2466] 46689 sec elapsed[28] train-logloss:-nan train-rmse:2.35525
round 28:[ 2466] 58361 sec elapsed[29] train-logloss:-nan train-rmse:2.04922
round 29:[ 2466] 70034 sec elapsed[30] train-logloss:-nan train-rmse:1.72671
start from 0030.model
layer:fc7
wmat:eta = 0.000001
bias:eta = 0.000002
layer:fc8
wmat:eta = 0.000001
bias:eta = 0.000002
round 30:[ 2466] 11675 sec elapsed[31] train-logloss:-nan train-rmse:2.81689
round 31:[ 2466] 23350 sec elapsed[32] train-logloss:-nan train-rmse:2.46264
round 32:[ 2466] 35021 sec elapsed[33] train-logloss:-nan train-rmse:2.16123
round 33:[ 2466] 46691 sec elapsed[34] train-logloss:-nan train-rmse:1.86558
round 34:[ 2466] 58362 sec elapsed[35] train-logloss:-nan train-rmse:1.58915
round 35:[ 2466] 70034 sec elapsed[36] train-logloss:-nan train-rmse:1.32312
Hi Qi,
Thank you so much for your advice. Our problem is not suitable for
fine-tuning so we decide to train the net directly. However, the toolbox
does not work and we decide to give up cxxnet and turn to caffe. Thank you
again for your help and hope we can discuss and collaborate someday : )
Best Regards,
Yashu
On Monday, July 13, 2015, Qi Wu notifications@github.com wrote:
Hi Yashu
I am using the pre-trained VGGNet16 (trained on ImageNet of course) as the
initial model. And then fine tune the last FC layer (fc7) and the
classification layer (change 1000 to 256, which is my label width). Also, I
change the loss layer from softmax to multi_logistic. For all the other
layers, I keep learning rate as 0, so the parameters will be fixed as the
VGGNet.I start my training with the learning rate = 0.001 and decrease it when
the train-RMSE error doesn't decrease any more. I only trained 36 rounds
and because my learning rate has become 0.000001, I stopped the training.
The following is my training log:start from vgg16.model
layer:fc7
wmat:eta = 0.0005
bias:eta = 0.0010
layer:fc8
wmat:eta = 0.0010
bias:eta = 0.0020round 0:[ 2466] 11686 sec elapsed[1] train-logloss:0.092616
train-rmse:6.30993
round 1:[ 2466] 23366 sec elapsed[2] train-logloss:-nan train-rmse:5.79034
round 2:[ 2466] 35045 sec elapsed[3] train-logloss:-nan train-rmse:5.65325
round 3:[ 2466] 46721 sec elapsed[4] train-logloss:-nan train-rmse:5.56152
round 4:[ 2466] 58397 sec elapsed[5] train-logloss:-nan train-rmse:5.48876
round 5:[ 2466] 70074 sec elapsed[6] train-logloss:-nan train-rmse:5.42933start from 0006.model
layer:fc7
wmat:eta = 0.0005
bias:eta = 0.0010
layer:fc8
wmat:eta = 0.0005
bias:eta = 0.0010round 6:[ 2466] 11681 sec elapsed[7] train-logloss:-nan train-rmse:5.33734
round 7:[ 2466] 23361 sec elapsed[8] train-logloss:-nan train-rmse:5.27811
round 8:[ 2466] 35040 sec elapsed[9] train-logloss:-nan train-rmse:5.2354
round 9:[ 2466] 46719 sec elapsed[10] train-logloss:-nan train-rmse:5.19465
round 10:[ 2466] 58396 sec elapsed[11] train-logloss:-nan
train-rmse:5.15824
round 11:[ 2466] 70071 sec elapsed[12] train-logloss:-nan
train-rmse:5.12289start from 0012.model
layer:fc7
wmat:eta = 0.00025
bias:eta = 0.00050
layer:fc8
wmat:eta = 0.00025
bias:eta = 0.00050round 12:[ 2466] 11686 sec elapsed[13] train-logloss:-nan
train-rmse:4.60376
round 13:[ 2466] 23383 sec elapsed[14] train-logloss:-nan
train-rmse:4.48242
round 14:[ 2466] 35060 sec elapsed[15] train-logloss:-nan train-rmse:4.4032
round 15:[ 2466] 46732 sec elapsed[16] train-logloss:-nan
train-rmse:4.33162
round 16:[ 2466] 58405 sec elapsed[17] train-logloss:-nan
train-rmse:4.28349
round 17:[ 2466] 70076 sec elapsed[18] train-logloss:-nan train-rmse:4.2459start from 0018.model
layer:fc7
wmat:eta = 0.00010
bias:eta = 0.00020
layer:fc8
wmat:eta = 0.00010
bias:eta = 0.00020round 18:[ 2466] 11674 sec elapsed[19] train-logloss:-nan
train-rmse:3.93583
round 19:[ 2466] 23353 sec elapsed[20] train-logloss:-nan
train-rmse:3.68861
round 20:[ 2466] 35027 sec elapsed[21] train-logloss:-nan
train-rmse:3.48819
round 21:[ 2466] 46701 sec elapsed[22] train-logloss:-nan
train-rmse:3.29444
round 22:[ 2466] 58375 sec elapsed[23] train-logloss:-nan
train-rmse:3.13445
round 23:[ 2466] 70048 sec elapsed[24] train-logloss:-nan
train-rmse:2.98958start from 0024.model
layer:fc7
wmat:eta = 0.00001
bias:eta = 0.00002
layer:fc8
wmat:eta = 0.00001
bias:eta = 0.00002round 24:[ 2466] 11671 sec elapsed[25] train-logloss:-nan
train-rmse:3.27728
round 25:[ 2466] 23347 sec elapsed[26] train-logloss:-nan
train-rmse:2.95055
round 26:[ 2466] 35017 sec elapsed[27] train-logloss:-nan
train-rmse:2.65933
round 27:[ 2466] 46689 sec elapsed[28] train-logloss:-nan
train-rmse:2.35525
round 28:[ 2466] 58361 sec elapsed[29] train-logloss:-nan
train-rmse:2.04922
round 29:[ 2466] 70034 sec elapsed[30] train-logloss:-nan
train-rmse:1.72671start from 0030.model
layer:fc7
wmat:eta = 0.000001
bias:eta = 0.000002
layer:fc8
wmat:eta = 0.000001
bias:eta = 0.000002round 30:[ 2466] 11675 sec elapsed[31] train-logloss:-nan
train-rmse:2.81689
round 31:[ 2466] 23350 sec elapsed[32] train-logloss:-nan
train-rmse:2.46264
round 32:[ 2466] 35021 sec elapsed[33] train-logloss:-nan
train-rmse:2.16123
round 33:[ 2466] 46691 sec elapsed[34] train-logloss:-nan
train-rmse:1.86558
round 34:[ 2466] 58362 sec elapsed[35] train-logloss:-nan
train-rmse:1.58915
round 35:[ 2466] 70034 sec elapsed[36] train-logloss:-nan
train-rmse:1.32312—
Reply to this email directly or view it on GitHub
#194 (comment).
- Yashu