The reason why constant loss
weiweisunWHU opened this issue · 30 comments
Hello Wei,
I have prepared the data and trained the models without changing anything. However, I found the loss converging to 3.69. Then I changed initial learning rate but obtaining same converged loss (3.69). Do you know is there any problem? By the way, could you please provide the trained weight? Thanks a lot.
Could you post your a rendered view image?
A possible problem is that you have "grey object on white background". We need "white object on black backgroud".
Would you try inverting the views so that they have black background?
White background would make the activations unstable.
You can also try my rendered views:
https://drive.google.com/open?id=0B4v2jR3WsindMUE3N2xiLVpyLW8
- Yes, I successfully trained my model to about 90% accuracy with ReLU after FC8.
Thank you very much!
I successfully trained the model too by using mean-subtraction. Still, I discarded the last ReLU layer.
Anyway, Thanks a lot for your kind help.
@WeiTang114 I used the rendered views(given by your link) as input and still get a constant loss of approximately 3.69 even after 25 epochs! Could you tell me what might be the issue? Also, is there a way to visualize the loss/accuracy vs epoch?
Thanks!
- What's your learning rate? 0.001 should be good.
- You can use TensorBoard for Visualization. Specifically start it by
$ tensorboard --logdir tmp/ --port 5000
, and view the graphs in the browser.
@WeiTang114
Thanks for your reply! I used a learning rate of 0.001 for my training as suggested.
@Priyam1994 Then I'm not very sure what problem it may be.
I would see if gradients exploded (in the "distribution" tab in tensorboard), or weights are not initialized well, etc.
In my perspective, the learning rate for deep layer (fc layers) should be multiplied by 10. It works for me. I recommend it for you @Priyam1994.
@weiweisunWHU Could you tell me how you changed the learning rate only for the fc layers?
Additionally, did you train the network from scratch or used the pre-trained model to fine tune?
Thank you.
@Priyam1994
For example:
opt1=tf.train.AdamOptimizer(lr*10).minimize(loss,var_list=listvar_update)
the var_list should be the variables list of the fc layers.
@weiweisunWHU
I am having the same problem.
With this information alone I could not solve it. I'm sorry, could you tell me in a bit more detail?
For example, I want to know the source code change.
@weiweisunWHU @Priyam1994 @Xmen0123
Sorry! I found there was a typo in readme. The "--learning-rate=0.001" should be "--learning_rate=0.001", so that argument didn't work...
Also, I found 0.0001 is more reliable for training.
I've updated the code (commit b476e17f11bd540f4f962ae157f20c17067996b2).
@youkaichao
It should work with black background. Mine is at the comment above. Otherwise, you may just invert the images offline or online (invert the image in input.py, after reading the images at line 27).
If simply inverting your images works, I'll consider adding an option such as "--white_background=True" 😅
@WeiTang114
It works! After adding the code below in input.py, after line 27, the loss is no longer constantly 3.69 and the accuracy is decent now.
im = cv2.bitwise_not(im)
But I'm puzzled. What's the difference between white and black? If I feed the MVCNN with white-background views, it's expected to identify the white-background views, isn't it?
@youkaichao
Black is 0 and white is 255. My theory is that zero-background passed into convolution layers (practically matrix multiplications) leads to zero outputs, while the object in greyscale has informative output after the convolution. Thus black background makes the activation of the layers more stable than white.
Now the problem is solved, I got an accuracy of 85%. Not state-of-art, but reasonably good. Thank you! I'll go fine-tuning now ^_^
@WeiTang114 Your suggestion worked and I could achieve a test accuracy of 88%. But it always results in the constant 3.69 loss if I use the caffe alexnet model, any thoughts? @youkaichao Did you train from scratch or did you use the alexnet model? If you did use the pretrained model, did you make any changes?
Thanks!
@Priyam1994 I'm using the pretrained alexnet model, and it works. I have made no changes to the pretrained alexnet model.
@youkaichao Thank you for the clarification. Did you maybe change any other parameters than what has been originally suggested?
@Priyam1994 Nope. I run MVCNN with default settings. And I don't know how to replace the alexnet model ……
Is it possible that you forgot to run ./prepare_pretrained_alexnet.sh?
@WeiTang114 Could you tell me if there is a specific reason the rendered images(specified in the link given by you) are of dimension 600 * 600? Can I also feed in images of other dimensions, say 300 * 300 or 400*400?
Thank you
@Priyam1994 That's just random. After input into the network the images are resized to 256 and cropped to 227, so I just wanted a large enough size so that the images aren't distorted after resized. Of course any other size is fine!
@WeiTang114 Thank you for your and informative reply!
@weiweisunWHU
hi,can you tell me what changes have you made?i set substract_mean is true and remove relu layer after fc8,and the prediction are always same(not only 0,but also other numbers).
@youkaichao
hi,i want to know you just add "cv2.bitwise_not(im)" after line27 in input.py,no other chages,then it works?why i cannot.
@youkaichao @WeiTang114 hello, after i use your code "cv2.bitwise_not(im)" after line 27, my acc is still about 2%, for example, sometimes it's "acc=1.953125", sometimes it's "acc=2.734375" and other value, and i haven't changed anything in the code, the dataset i train is ModelNet40 with write background like your airplane image. Do you know what should i do? thank you so much.
@weiweisunWHU your method really works!i tried it successfully to more than 85% accuracy.
@491506870
Hi, would you please specify the method? Like which code you have changed. Thanks a lot!