Doubts with the cropping after deconv
dokutagero opened this issue · 5 comments
Hey @wkentaro, I still have a doubt regarding the cropping performed after the transposed convolution. It's more of a conceptual doubt than code itself but maybe some other people might benefit from this doubt.
In the gist that you provided in the other issue I opened regarding the cropping I understand that after a x32 downsample, when upsampling, depending on the input size there might be an offset (if the size is not divisible by 32) and that's the reason of the cropping. Initially I thought that if I want to train with variable input sizes I have to modify that cropping to match the corresponding offset for that size.
I tried infering images from my dataset using the infer.py
code but I realized that checking the size of the prediction it matches the input size for different sizes, while I was expecting a different size. Shouldn't I modify the offset accordingly and why is the output matching the input size with a fixed crop of 19? I would highly appreciate if you could help me solving this doubt.
Thanks again!
Oh OK, nevermind. Of course the size matches because you are slicing from that offset the size of the input. Case solved.
In any case, For training with variable input sizes I should adjust that crop to the corresponding one, right?
In any case, For training with variable input sizes I should adjust that crop to the corresponding one, right?
You don't need to do it. The network automatically manages that alignment by training as described in the original work.
Oh, ok, thanks for pointing out the FAQ, I was unsure about the initial 100px padding.
My doubt came because in the original Caffee implementation, there is a crop layer in which explicitely one of the arguments to the function is the input data, so that cropping is going to be variable for different input sizes.
n.upscore = L.Deconvolution(n.score_fr,
convolution_param=dict(num_output=21, kernel_size=64, stride=32,
bias_term=False),
param=[dict(lr_mult=0)])
n.score = crop(n.upscore, n.data)
On the other hand, hardcoding a fixed crop for different sizes it's still difficult for me to follow if it is going to be aligned with the original label. I guess I have to check the Crop layer implementation from Caffe to fully understand the alignment of the crops.
Maybe you'd like to understand the behavior of crop_map_from_to
in caffe.coord_map
. https://github.com/BVLC/caffe/blob/master/python/caffe/coord_map.py#L178
But I think it does the exact same thing as the gist https://gist.github.com/wkentaro/1d32e54535f0486d09e962302a9bc068#file-get_fcn32s_offsets-py.
Also, you can check the decided offset (19) in the prototxt.
https://github.com/shelhamer/fcn.berkeleyvision.org/blob/affc85ff245099dbde06609d66ed08d73411f4ad/voc-fcn32s/train.prototxt#L509-L519
Thanks for the info again, I really appreciate all the feedback you provide. I will give a try to train my dataset without any modification to the code.
I will also take a closer look to the code of crop_map_from_to
to have a better understanding of the network. Intuitively I didn't find clear the fixed cropping when it comes to alignment for different sized image-label pairs.
If by any chance I find something I'm not sure or I don't find a clear correspondence with the code from the official implementation, I will let you know.