facebookarchive/eyescream

SpatialConvolutionUpsample behaviour

Closed this issue · 10 comments

I was expecting the SpatialConvolutionUpsample class to do the expected "upsampling" but it seems like this class is doing something else. Here is one example:

conv = nn.SpatialConvolutionUpsample(1,1,1,1,3)
w, dw = conv:parameters()
w[1]:fill(1)
w[2]:zero()

This creates an upsampling module that upsamples the input image by a factor of 3, the convolution is 1x1 with weight 1 and bias 0 so it just copies the input.

I tried this on a 1x1x2x2 input tensor:

x = torch.range(1,4):resize(1,1,2,2)
y = conv:forward(x)

and here is the result:

th> x
(1,1,.,.) = 
  1  2
  3  4
[torch.DoubleTensor of size 1x1x2x2]

th> y
(1,1,.,.) = 
  1  2  3  4  1  2
  3  4  1  2  3  4
  1  2  3  4  1  2
  3  4  1  2  3  4
  1  2  3  4  1  2
  3  4  1  2  3  4
[torch.DoubleTensor of size 1x1x6x6]

However I was actually expecting y to be like this (which I think is the more standard "upsampling"):

1 1 1 2 2 2
1 1 1 2 2 2
1 1 1 2 2 2
3 3 3 4 4 4
3 3 3 4 4 4
3 3 3 4 4 4

The problem is, in the current SpatialConvolutionUpsample class, the new views created after computing the results do not play very well with element ordering. I wonder if this is the intended behaviour?

@yujiali yes, this is intended behavior, sorry the module is poorly named. What you are looking for is https://github.com/torch/nn/blob/master/doc/convolution.md#nn.SpatialUpSamplingNearest

Unfortunately the SpatialUpSamplingNearest module does not support upsampling with learned parameters which is what I actually need. There is a discussion at torch/nn#405 about a similar topic but the full convolution module does not do the same thing. I'm looking for the behaviour described in sec 3.3 of http://www.cs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf which seems to be supported in Caffe http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1DeconvolutionLayer.html

Given this I'm quite surprised that this eyescream model actually works as the spatial pixel ordering are all broken after one SpatialConvolutionUpsample forward pass.. Any insights on why this is not a problem?

if you are looking for shelhamers's deconvolutionlayer, SpatialFullConvolution is exactly that.

if you look at how we use this module in eyescream, we only use it to calculate "SAME" padding, we only use it at a scale of 1.0, so no upsampling.

Thank you @soumith for the clarification. Now I realized that when preparing data you scaled the images down and then up so resolution is reduced but image size is kept the same and therefore on all layers the images are actually the same size.

And yes you are right about the deconvolution layer and SpatialFullConvolution being equivalent, I was wrong on that.

@yujiali, @soumith : Could you point to some code(in torch) where SpatialFullConvolution being used as upconvolution layer in shelhamer's fcn? Also, the SpatialFullConvolution does not support fractional strides, is there a way to do that using it somehow? Thanks.

@anuragranj google for Deconvolution caffe

@soumith I was looking for an implementation in torch. The caffe framework is quite clear.

Great. Thanks!