Doubts with the cropping after deconv

Question

Doubts with the cropping after deconv

dokutagero opened this issue 7 years ago · 5 comments

Hey @wkentaro, I still have a doubt regarding the cropping performed after the transposed convolution. It's more of a conceptual doubt than code itself but maybe some other people might benefit from this doubt.

In the gist that you provided in the other issue I opened regarding the cropping I understand that after a x32 downsample, when upsampling, depending on the input size there might be an offset (if the size is not divisible by 32) and that's the reason of the cropping. Initially I thought that if I want to train with variable input sizes I have to modify that cropping to match the corresponding offset for that size.

I tried infering images from my dataset using the infer.py code but I realized that checking the size of the prediction it matches the input size for different sizes, while I was expecting a different size. Shouldn't I modify the offset accordingly and why is the output matching the input size with a fixed crop of 19? I would highly appreciate if you could help me solving this doubt.

Thanks again!

Answer 1 · 2017-11-19T13:05:38.000Z

Oh OK, nevermind. Of course the size matches because you are slicing from that offset the size of the input. Case solved.

In any case, For training with variable input sizes I should adjust that crop to the corresponding one, right?

Answer 2 · 2017-11-25T07:26:53.000Z

In any case, For training with variable input sizes I should adjust that crop to the corresponding one, right?

You don't need to do it. The network automatically manages that alignment by training as described in the original work.

Answer 3 · 2017-11-25T08:08:30.000Z

Oh, ok, thanks for pointing out the FAQ, I was unsure about the initial 100px padding.

My doubt came because in the original Caffee implementation, there is a crop layer in which explicitely one of the arguments to the function is the input data, so that cropping is going to be variable for different input sizes.

 n.upscore = L.Deconvolution(n.score_fr,
        convolution_param=dict(num_output=21, kernel_size=64, stride=32,
            bias_term=False),
        param=[dict(lr_mult=0)])
n.score = crop(n.upscore, n.data)

On the other hand, hardcoding a fixed crop for different sizes it's still difficult for me to follow if it is going to be aligned with the original label. I guess I have to check the Crop layer implementation from Caffe to fully understand the alignment of the crops.

Answer 4 · 2017-11-26T05:05:38.000Z

Maybe you'd like to understand the behavior of crop_map_from_to in caffe.coord_map. https://github.com/BVLC/caffe/blob/master/python/caffe/coord_map.py#L178
But I think it does the exact same thing as the gist https://gist.github.com/wkentaro/1d32e54535f0486d09e962302a9bc068#file-get_fcn32s_offsets-py.

Also, you can check the decided offset (19) in the prototxt.
https://github.com/shelhamer/fcn.berkeleyvision.org/blob/affc85ff245099dbde06609d66ed08d73411f4ad/voc-fcn32s/train.prototxt#L509-L519

Answer 5 · 2017-11-26T08:29:21.000Z

Thanks for the info again, I really appreciate all the feedback you provide. I will give a try to train my dataset without any modification to the code.

I will also take a closer look to the code of crop_map_from_to to have a better understanding of the network. Intuitively I didn't find clear the fixed cropping when it comes to alignment for different sized image-label pairs.

If by any chance I find something I'm not sure or I don't find a clear correspondence with the code from the official implementation, I will let you know.