janivanecky/Depth-Estimation

Learning to train the first component of the net

Closed this issue · 12 comments

Hi @janivanecky
I have read your thesis, but I still do not grasp too much about training the net.
To train the first component, is the command like this: ./build/tools/caffe train --solver=solver.prototxt ?
If so, where should I add below lines?
solver = caffe.get_solver('solver.prototxt')
solver.net.copy_from('bvlc_alexnet.caffemodel')

Thanks for the help!
Yingjun

I commited a source folder from my master's thesis. It contains all the prototxt files and template python scripts to train and test each network component with each of the tested loss functions. There's a README.txt in the folder which contains some more detailed info on how to do the training process. I hope it helps and if you have any further questions, just ask :)

Hi @janivanecky
With your help, I can train the first component - global_context_network now. :-) I copied net_train.prototxt from abs, log_abs, norm and sc-inv folders to global_context_network folder seperately. After that, I ran train.py and training started. However, I still have some questions.

  1. I just have 10000 images and their depth maps. At iteration 8000, training loss did not continue to decrease. It was about 9(abs, log_abs and sc-inv) or 90(norm). But then test loss was about 25 ~ 30(abs, log_abs and sc-inv) or 220(norm). Was it in overfitting status? What can I do to solve overfitting?
  2. Should I just pick net_train.prototxt from norm folder and use it? Your thesis says that norm loss has a better performance.
  3. README.txt hints that we should modify train.py to fit the desired training process. train.py is pretty simple, what modifications can we do?

Thanks!
Yingjun

Hi, I'm glad you're progressing with the training :) As for your questions:

  1. Yes, overfitting was inevitable in my thesis as well. Two things helped: a.) Larger dataset - eventually I used NYUDepth v2 raw dataset with additional augmentation, which resulted in ~450k images. b.) Dropout - I think dropout is defined in the prototxt files, you can try to increase dropout rate. Even though these 2 things helped with overfitting, they didn't eliminate it. Training loss was still much lower than testing loss.
    2.) Norm loss has a best performance when you don't care about the absolute depth values. I thing that's a reasonable assumption to make, when you're using the output as the input into other network.
    3.) I think the train.py is ready for the training process you want, it is what I used.

I hope I answered your questions satisfyingly :)

Hi @janivanecky,
Yup, your answers are very informative. :-) Thanks!
Soon I will start to train the second component - gradient_network. Should I use parameters from the trained model of the first component?
Should I look into gradient_network or joint folder?

Thanks!
Yingjun

You'll find everything necessary in the gradient_network folder. You don't have to use the parameters from the global context network, the network is trained from scratch with the exception of the first layer which is initialized using AlexNet. There's also a need to initialize convolutional layers that do gradient computation, but that's already set up in train.py.
Cheers!

Seems both global_context_network and gradient_network use AlexNet? I find 'solver.net.copy_from('bvlc_alexnet.caffemodel')' in both of their train.py.

Yes, that's right.

Hi @janivanecky
I am starting to train the third component, the refining network. Looks I have to re-create depth datasets which have a different size(74*54)?
I think I am training three components separately. How to train a final model which includes parameters from the three components?
By the way, I should use prototxt file from norm_abs folder to train the refining network to get the best performance, right?

Thanks,
Yingjun

Yes, you have to create dataset with depth map sized 74x54. As you say, you should use prototxt from norm_abs folder to get the best performance, and you can include parameter from the other components by using train.py file in the refining_network folder. There are undefined variables for the paths to pretrained models of other components. You just fill in these paths and it should work.
Let me know how it works out!

I understand. Thanks for the guide! Finally I have completed the training. The final test net output is:
lossABSDepth = 121.88
lossMVNDepth = 720.129
Is this good? So I completed all the training work of your thesis?

Thanks,
Yingjun

I cannot say if that's good or bad, I had lower ABS loss, around 30, but my MVN loss was over 800. Anyways, congrats on completing the process :)

Thanks for the guide so far! You really helped me a lot. Closing this issue and will continue to ask if I have more questions. :-)