facebookresearch/fairseq-lua

How does the 1D convolution work?

robrechtme opened this issue · 2 comments

I understand the concept of 1D convolution, where a 1D kernel takes the dot product over a 1D window. In the paper however we work with a window of k inputs embedded in d dimensions, so we start with a k x d matrix.
How does a one dimensional convolution work here? Does there exist a clear visualisation of the process?
I also don't understand how a kernel of size 2d x kd can result in an output of size 1 x 2d

I already took a look at #111 but I still don't understand it.

Ok, so maybe it's easier to start from the "standard" convolution operation on images, which usually involves a Y x X x C input, i.e. a Y x X-dimensional image with C channels. At the input you'll have e.g. 3 channels for color images, but for successive convolutions you'll deal with C = number of kernels of the previous convolution. In any case, each kernel here will be connected to k_y x k_x x C input elements.

In our case, each convolution is done with 2*d kernels that are each connected to k x d input elements, i.e. d is the number of channels. Hence, each k x d input window will produce one response from each kernel and you end up with the 1 x 2*d output per window, or k x 2*d for the whole sequence. The GLU activation then reduces this again to k x d and we can apply the next convolution.

I'm confused. You said we use C kernels for C channels, but in this case we use 2*d kernels for d channels? Do we apply 2 different kernels to each channel then?

Edit: I just learned that there is a difference between number of features (2d) and number of channels (d, which get summed up after the dot product) and it's all clear now. Thanks!