Learning to train the first component of the net

Question

Learning to train the first component of the net

Closed this issue 8 years ago · 12 comments

Hi @janivanecky
I have read your thesis, but I still do not grasp too much about training the net.
To train the first component, is the command like this: ./build/tools/caffe train --solver=solver.prototxt ?
If so, where should I add below lines?
solver = caffe.get_solver('solver.prototxt')
solver.net.copy_from('bvlc_alexnet.caffemodel')

Thanks for the help!
Yingjun

Answer 1 · 2016-10-21T14:07:02.000Z

I commited a source folder from my master's thesis. It contains all the prototxt files and template python scripts to train and test each network component with each of the tested loss functions. There's a README.txt in the folder which contains some more detailed info on how to do the training process. I hope it helps and if you have any further questions, just ask :)

Answer 2 · 2016-10-25T10:43:38.000Z

Hi @janivanecky
With your help, I can train the first component - global_context_network now. :-) I copied net_train.prototxt from abs, log_abs, norm and sc-inv folders to global_context_network folder seperately. After that, I ran train.py and training started. However, I still have some questions.

I just have 10000 images and their depth maps. At iteration 8000, training loss did not continue to decrease. It was about 9(abs, log_abs and sc-inv) or 90(norm). But then test loss was about 25 ~ 30(abs, log_abs and sc-inv) or 220(norm). Was it in overfitting status? What can I do to solve overfitting?
Should I just pick net_train.prototxt from norm folder and use it? Your thesis says that norm loss has a better performance.
README.txt hints that we should modify train.py to fit the desired training process. train.py is pretty simple, what modifications can we do?

Thanks!
Yingjun

Answer 3 · 2016-10-25T11:00:03.000Z

Hi, I'm glad you're progressing with the training :) As for your questions:

Yes, overfitting was inevitable in my thesis as well. Two things helped: a.) Larger dataset - eventually I used NYUDepth v2 raw dataset with additional augmentation, which resulted in ~450k images. b.) Dropout - I think dropout is defined in the prototxt files, you can try to increase dropout rate. Even though these 2 things helped with overfitting, they didn't eliminate it. Training loss was still much lower than testing loss.
2.) Norm loss has a best performance when you don't care about the absolute depth values. I thing that's a reasonable assumption to make, when you're using the output as the input into other network.
3.) I think the train.py is ready for the training process you want, it is what I used.

I hope I answered your questions satisfyingly :)

Answer 4 · 2016-10-26T09:28:14.000Z

Hi @janivanecky,
Yup, your answers are very informative. :-) Thanks!
Soon I will start to train the second component - gradient_network. Should I use parameters from the trained model of the first component?
Should I look into gradient_network or joint folder?

Thanks!
Yingjun

Answer 5 · 2016-10-26T10:13:41.000Z

You'll find everything necessary in the gradient_network folder. You don't have to use the parameters from the global context network, the network is trained from scratch with the exception of the first layer which is initialized using AlexNet. There's also a need to initialize convolutional layers that do gradient computation, but that's already set up in train.py.
Cheers!

Answer 6 · 2016-10-26T10:22:19.000Z

Seems both global_context_network and gradient_network use AlexNet? I find 'solver.net.copy_from('bvlc_alexnet.caffemodel')' in both of their train.py.

Answer 7 · 2016-10-26T12:10:34.000Z

Yes, that's right.

Answer 8 · 2016-11-05T07:53:24.000Z

Hi @janivanecky
I am starting to train the third component, the refining network. Looks I have to re-create depth datasets which have a different size(74*54)?
I think I am training three components separately. How to train a final model which includes parameters from the three components?
By the way, I should use prototxt file from norm_abs folder to train the refining network to get the best performance, right?

Thanks,
Yingjun

Answer 9 · 2016-11-06T16:05:39.000Z

Yes, you have to create dataset with depth map sized 74x54. As you say, you should use prototxt from norm_abs folder to get the best performance, and you can include parameter from the other components by using train.py file in the refining_network folder. There are undefined variables for the paths to pretrained models of other components. You just fill in these paths and it should work.
Let me know how it works out!

Answer 10 · 2016-11-09T06:58:17.000Z

I understand. Thanks for the guide! Finally I have completed the training. The final test net output is:
lossABSDepth = 121.88
lossMVNDepth = 720.129
Is this good? So I completed all the training work of your thesis?

Thanks,
Yingjun

Answer 11 · 2016-11-09T21:21:36.000Z

I cannot say if that's good or bad, I had lower ABS loss, around 30, but my MVN loss was over 800. Anyways, congrats on completing the process :)

Answer 12 · 2016-11-11T01:49:25.000Z

Thanks for the guide so far! You really helped me a lot. Closing this issue and will continue to ask if I have more questions. :-)