The reason why constant loss

Question

The reason why constant loss

weiweisunWHU opened this issue 7 years ago · 30 comments

Hello Wei,
I have prepared the data and trained the models without changing anything. However, I found the loss converging to 3.69. Then I changed initial learning rate but obtaining same converged loss (3.69). Do you know is there any problem? By the way, could you please provide the trained weight? Thanks a lot.

Answer 1 · 2017-06-16T08:14:20.000Z

Could you post your a rendered view image?
A possible problem is that you have "grey object on white background". We need "white object on black backgroud".

Answer 2 · 2017-06-16T17:09:09.000Z

Thank you for your reply!
It is one of the rendered images.
Did you successfully train this model? Because I find that your FC8 layer is followed by a RELU layer. That's why the model finally outputs a stable value (0).

Answer 3 · 2017-06-16T17:13:45.000Z

Would you try inverting the views so that they have black background?
White background would make the activations unstable.
You can also try my rendered views:
https://drive.google.com/open?id=0B4v2jR3WsindMUE3N2xiLVpyLW8

Yes, I successfully trained my model to about 90% accuracy with ReLU after FC8.

Answer 4 · 2017-06-19T03:19:59.000Z

Thank you very much！
I successfully trained the model too by using mean-subtraction. Still, I discarded the last ReLU layer.
Anyway, Thanks a lot for your kind help.

Answer 5 · 2017-06-23T08:45:19.000Z

@WeiTang114 I used the rendered views(given by your link) as input and still get a constant loss of approximately 3.69 even after 25 epochs! Could you tell me what might be the issue? Also, is there a way to visualize the loss/accuracy vs epoch?

Thanks!

Answer 6 · 2017-06-23T08:49:41.000Z

@Priyam1994

What's your learning rate? 0.001 should be good.
You can use TensorBoard for Visualization. Specifically start it by $ tensorboard --logdir tmp/ --port 5000, and view the graphs in the browser.

Answer 7 · 2017-06-23T09:03:20.000Z

@WeiTang114
Thanks for your reply! I used a learning rate of 0.001 for my training as suggested.

Answer 8 · 2017-06-23T09:11:00.000Z

@Priyam1994 Then I'm not very sure what problem it may be.
I would see if gradients exploded (in the "distribution" tab in tensorboard), or weights are not initialized well, etc.

Answer 9 · 2017-06-29T16:31:07.000Z

In my perspective, the learning rate for deep layer (fc layers) should be multiplied by 10. It works for me. I recommend it for you @Priyam1994.

Answer 10 · 2017-06-30T08:51:06.000Z

@weiweisunWHU Could you tell me how you changed the learning rate only for the fc layers?
Additionally, did you train the network from scratch or used the pre-trained model to fine tune?
Thank you.

Answer 11 · 2017-06-30T17:07:06.000Z

@Priyam1994
For example:
opt1=tf.train.AdamOptimizer(lr*10).minimize(loss,var_list=listvar_update)
the var_list should be the variables list of the fc layers.

Answer 12 · 2017-07-07T18:28:50.000Z

@weiweisunWHU
I am having the same problem.
With this information alone I could not solve it. I'm sorry, could you tell me in a bit more detail?
For example, I want to know the source code change.

Answer 13 · 2017-07-18T03:52:38.000Z

@weiweisunWHU @Priyam1994 @Xmen0123
Sorry! I found there was a typo in readme. The "--learning-rate=0.001" should be "--learning_rate=0.001", so that argument didn't work...

Also, I found 0.0001 is more reliable for training.

I've updated the code (commit b476e17f11bd540f4f962ae157f20c17067996b2).

Answer 14 · 2017-07-21T04:05:22.000Z

I also get the magic number:3.69.
My rendered view is in white background with the size of 224x224.

With the constant loss of 3.69, I got a poor accuracy of just 2%. Too sad…… There must be something wrong……

Answer 15 · 2017-07-21T05:08:45.000Z

@youkaichao
It should work with black background. Mine is at the comment above. Otherwise, you may just invert the images offline or online (invert the image in input.py, after reading the images at line 27).

If simply inverting your images works, I'll consider adding an option such as "--white_background=True" 😅

Answer 16 · 2017-07-21T08:39:53.000Z

@WeiTang114
It works! After adding the code below in input.py, after line 27, the loss is no longer constantly 3.69 and the accuracy is decent now.
im = cv2.bitwise_not(im)

But I'm puzzled. What's the difference between white and black? If I feed the MVCNN with white-background views, it's expected to identify the white-background views, isn't it?

Answer 17 · 2017-07-21T08:52:25.000Z

@youkaichao
Black is 0 and white is 255. My theory is that zero-background passed into convolution layers (practically matrix multiplications) leads to zero outputs, while the object in greyscale has informative output after the convolution. Thus black background makes the activation of the layers more stable than white.

Answer 18 · 2017-07-21T12:28:08.000Z

Now the problem is solved, I got an accuracy of 85%. Not state-of-art, but reasonably good. Thank you! I'll go fine-tuning now ^_^

Answer 19 · 2017-07-21T12:48:31.000Z

@WeiTang114 Your suggestion worked and I could achieve a test accuracy of 88%. But it always results in the constant 3.69 loss if I use the caffe alexnet model, any thoughts? @youkaichao Did you train from scratch or did you use the alexnet model? If you did use the pretrained model, did you make any changes?
Thanks!

Answer 20 · 2017-07-21T15:12:34.000Z

@Priyam1994 I'm using the pretrained alexnet model, and it works. I have made no changes to the pretrained alexnet model.

Answer 21 · 2017-07-21T15:30:28.000Z

@youkaichao Thank you for the clarification. Did you maybe change any other parameters than what has been originally suggested?

Answer 22 · 2017-07-21T16:48:06.000Z

@Priyam1994 Nope. I run MVCNN with default settings. And I don't know how to replace the alexnet model ……
Is it possible that you forgot to run ./prepare_pretrained_alexnet.sh?

Answer 23 · 2017-09-10T10:40:34.000Z

@WeiTang114 Could you tell me if there is a specific reason the rendered images(specified in the link given by you) are of dimension 600 * 600? Can I also feed in images of other dimensions, say 300 * 300 or 400*400?

Thank you

Answer 24 · 2017-09-10T11:00:21.000Z

@Priyam1994 That's just random. After input into the network the images are resized to 256 and cropped to 227, so I just wanted a large enough size so that the images aren't distorted after resized. Of course any other size is fine!

Answer 25 · 2017-09-10T11:02:38.000Z

@WeiTang114 Thank you for your and informative reply!

Answer 26 · 2017-11-01T09:10:58.000Z

@weiweisunWHU
hi，can you tell me what changes have you made?i set substract_mean is true and remove relu layer after fc8,and the prediction are always same(not only 0,but also other numbers).

Answer 27 · 2017-11-01T09:16:15.000Z

@youkaichao
hi,i want to know you just add "cv2.bitwise_not(im)" after line27 in input.py,no other chages,then it works?why i cannot.

Answer 28 · 2018-09-04T05:09:42.000Z

@youkaichao @WeiTang114 hello, after i use your code "cv2.bitwise_not(im)" after line 27, my acc is still about 2%, for example, sometimes it's "acc=1.953125", sometimes it's "acc=2.734375" and other value, and i haven't changed anything in the code, the dataset i train is ModelNet40 with write background like your airplane image. Do you know what should i do? thank you so much.

Answer 29 · 2018-09-12T01:00:24.000Z

@weiweisunWHU your method really works！i tried it successfully to more than 85% accuracy.

Answer 30 · 2019-04-12T15:02:03.000Z

@491506870
Hi, would you please specify the method? Like which code you have changed. Thanks a lot!