Questions about training pipeline
Opened this issue · 2 comments
First of all, thanks for sharing the great project! I have tried to implement your mobilePydnet network but cannot reach totally to the same results compared with pre-trained model. For that reason I have several questions about the model, loss, data and training itself.
-
Did you initialize weights and biases by using some particular initialization strategy or did you just use the default initialization of convolution layers?
-
Did you use any data augmentation like flipping, rotating, random cropping or blurring?
-
You told here in the issues section that your range of input and output images are [0,255]. Does it mean that in the training when you load input image and ground truth as float32, you don't normalize them for example by dividing 255 to range [0,1]?
-
The loss is described in the paper as: , where is fixed to 1 and goes from 0.5, 0.25, 0.125 (if I understood correctly you just used 3 different scales). Here is the python code for calculating the loss but I'm not sure if I am missing something:
I found the answer to the question 1. from the provided codes. So, convolutional kernels are initialized with xavier initialization and biases with trundated normal initialization (mean 0.0 and std 1.0).
# Conv2D
weights = tf.get_variable(
"weights",
kernel_shape,
initializer=tf.contrib.layers.xavier_initializer(),
dtype=tf.float32,
)
biases = tf.get_variable(
"biases",
bias_shape,
initializer=tf.truncated_normal_initializer(),
dtype=tf.float32,
However I have a new question about the network architecture.
- In the original Pydnet get_disp extracts depth map by means of a sigmoid operator but in your network sigmoids are replaced by convolutions which outputs 1 channel. Does it really goes like this?
Hi, how did your training go?