TiagoCortinhal/SalsaNext

Downsampling rate

benemer opened this issue · 3 comments

Hi!

From your ResBlock class, I can see that you use a constant downsampling rate of 2 by using the nn.AvgPool2d layer with kernel_size=3, stride=2 and padding=1.

However, in your arXiv paper, the first residual block downsamples the width from 2048 to 512 which indicates a downsampling rate of 4. Also, I don't understand how the last layer upsamples the feature map from 1024x64x32 to 2048x64x32 since in your code, a Conv2d layer with kernel_size=(1,1) is used here.

Is this a mistake in the visualization of the architecture?

Thank you!

Hello!!

Yes... Strangely I haven't pick up on that visualization error. We tried to have an extra layer on both sides and I must have forgotten to correctly update the figure.

It should go like
2048,64
1024, 32
512, 16
256, 8
128, 4

The final conv-1x1 shouldn't change the dimensionality either as you pointed out.

Thanks for pointing out!

Thanks a lot for the fast reply and clarification!

Since you use 4 pooling layers, I assume you mean:

2048,64
1024, 32
512, 16
256, 8
128, 4

Thanks again!

And I did it again ahah!!

Exactly that!