dmlc/cxxnet

Multi-label Training got train-logloss:-nan

sxjzwq opened this issue · 3 comments

Hi,

I noticed that cxxnet provide an easy way to do the multi-label classification. But I met some issues duing the training.

Following are some parts of my config

layer[23->24] = fullc:fc8_fine_tune
nhidden = 256

layer[24->24] = multi_logistic
target = label

You can see I have 256 labels for each image so I set fc8_fine_tune out put as 256. The final layer is the loss layer and I am using the elementwise logistic loss function.

evaluation metric

metric = logloss
metric = error

I have two evaluation metric, one 'logloss' and one 'error'. I got following output during the training

initializing end, start working
continuing from round 0[1]
round 0:[ 1655] 7819 sec elapsed[1] train-logloss:-nan train-error:0.963257
round 1:[ 180]......

I got the '-nan' of training logloss. I am not sure whether it will become some real numbers or not with more training round. But can I have some suggestions about why this happen and how to avoid this? My learning parameters are set as following:

updater = sgd
momentum = 0.9
wmat:eta = 0.001
wmat:wd = 0.0005
bias:eta = 0.002
bias:wd = 0.0005

Please try to set the clip_gradient as for example clip_gradient = 10 in the conf file.

Thanks. Trying now.

round 0:[ 1656] 7834 sec elapsed[1] train-logloss:0.0965824 train-error:0.964808
round 1:[ 1656] 15668 sec elapsed[2] train-logloss:0.0858738 train-error:0.964808
round 2:[ 1656] 23502 sec elapsed[3] train-logloss:-nan train-error:0.964808
round 3:[ 1656] 31334 sec elapsed[4] train-logloss:0.0815175 train-error:0.964808
round 4:[ 1656] 39169 sec elapsed[5] train-logloss:0.0802771 train-error:0.964808
round 5:[ 1656] 47002 sec elapsed[6] train-logloss:-nan train-error:0.964808
round 6:[ 1656] 54834 sec elapsed[7] train-logloss:-nan train-error:0.964808
round 7:[ 1656] 62667 sec elapsed[8] train-logloss:-nan train-error:0.964808
round 8:[ 1656] 70499 sec elapsed[9] train-logloss:-nan train-error:0.964808
round 9:[ 1656] 78331 sec elapsed[10] train-logloss:-nan train-error:0.964808
round 10:[ 1656] 86163 sec elapsed[11] train-logloss:-nan train-error:0.964808
round 11:[ 1656] 93995 sec elapsed[12] train-logloss:-nan train-error:0.964808
round 12:[ 1656] 101828 sec elapsed[13] train-logloss:-nan train-error:0.964808
round 13:[ 1656] 109661 sec elapsed[14] train-logloss:-nan train-error:0.964808
round 14:[ 1656] 117494 sec elapsed[15] train-logloss:-nan train-error:0.964808
round 15:[ 1656] 125333 sec elapsed[16] train-logloss:-nan train-error:0.964808
round 16:[ 1656] 133172 sec elapsed[17] train-logloss:-nan train-error:0.964808
round 17:[ 1656] 141017 sec elapsed[18] train-logloss:-nan train-error:0.964808
round 18:[ 1656] 148853 sec elapsed[19] train-logloss:-nan train-error:0.964808
round 19:[ 1656] 156685 sec elapsed[20] train-logloss:-nan train-error:0.964808
round 20:[ 1656] 164516 sec elapsed[21] train-logloss:-nan train-error:0.964808
round 21:[ 1656] 172346 sec elapsed[22] train-logloss:-nan train-error:0.964808
round 22:[ 1656] 180175 sec elapsed[23] train-logloss:-nan train-error:0.964808
round 23:[ 1656] 188009 sec elapsed[24] train-logloss:-nan train-error:0.964808
round 24:[ 1656] 195843 sec elapsed[25] train-logloss:-nan train-error:0.964808
round 25:[ 1656] 203676 sec elapsed[26] train-logloss:-nan train-error:0.964808
round 26:[ 1656] 211510 sec elapsed[27] train-logloss:-nan train-error:0.964808
round 27:[ 1656] 219345 sec elapsed[28] train-logloss:-nan train-error:0.964808
round 28:[ 1656] 227179 sec elapsed[29] train-logloss:-nan train-error:0.964808
round 29:[ 1656] 235014 sec elapsed[30] train-logloss:-nan train-error:0.964808
round 30:[ 1656] 242849 sec elapsed[31] train-logloss:-nan train-error:0.964808

I am still getting the -nan train-logloss....