ColumbiaDVMM/CDC

Loss is -nan

minhtriet opened this issue · 2 comments

I am trying to fine tune a network. My dataset contains folders, each correspond to one single classes:

|--example1_class1/frames_1.jpg...frames_n.jpg
|--example2_class1/frames_1.jpg...frames_m.jpg
|--example1_class2/frames_1.jpg...frames_i.jpg
|--example2_class2/frames_1.jpg...frames_j.jpg
.
.
.

Each folder has more frames than the window size, which is 32.
I then run gen_test_bin_and_list.py, with the same default configuration. The code to compute v_label has also been modified.
Next, I run finetuning.sh, I have only 6 classes, so the prototxt file is changed. Please see log.train-val
Strange thing is the loss is -nan. I do not have any clue how to debug this. I also have attached the train file at fb_train.prototxt.

Which can I do to debug this error?
Thank you!

I would like to update some more details. The -nan happens randomly, in this case it does not appear after 760 iteration
log.train-val.txt

Some lessons I learned from using caffe and might be helpful for you to try is to decrease learning rate.