Inconsistency between the text and code

Hi,

Thanks for the great work. Actually, in Fig. 2 of the paper it is written that "*" stands for convolution. For example I_{r-->r}^{i}*f_{r} in Eq. (8) means these two maps get convolved together. However, in code you just use an element-wise multiplication between these two feature maps.

My second question is about unfolding. It seems that after unfolding the input variable (

TransDepth/pytorch/AttentionGraphCondKernel.py

Line 101 in 0a7422c

    
           inputs_se_1 = unfold(inputs_se, kernel_size=3, dilation=1, padding=1).view(f_se[0], f_se[1], self.ks ** 2,

), we get an output with the same spatial size but 9 additional channels, in addition to the previous channels we are already provided. I was just wondering if the spatial content was preserved by this type of unforlding, I mean if we sample the top right corner of the spatial maps, whether all the channels are from the same spatial location in the original map.

Thanks,

Thanks for your attention.
For Q1, actually we do get convolved step by step : unfold-->element-wise multiplication-->sum
For Q2, I think the main problem has been answered by A1. "9 additional channels" is the kernel size^2, which is not related to spatial content.

Thank you so much for your response. Actually, I didn't understand what does the unfolding do here for us? You mean it is the same as getting a copy from a feature map for 9 times and then storing them as a new dimension?

I know that it extracts a rolling blocks from the spatial dimension but here I can't imagine what it looks like in practice. May you please a bit elaborate on it.

def unfold(input, kernel_size, dilation=1, padding=0, stride=1):
# type: (Tensor, BroadcastingList2[int], BroadcastingList2[int], BroadcastingList2[int], BroadcastingList2[int]) -> Tensor # noqa
r"""Extracts sliding local blocks from a batched input tensor.

.. warning::
    Currently, only 4-D input tensors (batched image-like tensors) are
    supported.

.. warning::

    More than one element of the unfolded tensor may refer to a single
    memory location. As a result, in-place operations (especially ones that
    are vectorized) may result in incorrect behavior. If you need to write
    to the tensor, please clone it first.


See :class:`torch.nn.Unfold` for details
"""

Hope it is useful for you

Thanks Stanly, but I was looking for what by this function you were looking into!!

In other words, If this unfolding is considered whenever we want to implement a convolution between two feature maps?

it just for the AGD module.