How does the 1D convolution work?
robrechtme opened this issue · 2 comments
I understand the concept of 1D convolution, where a 1D kernel takes the dot product over a 1D window. In the paper however we work with a window of k
inputs embedded in d
dimensions, so we start with a k x d
matrix.
How does a one dimensional convolution work here? Does there exist a clear visualisation of the process?
I also don't understand how a kernel of size 2d x kd
can result in an output of size 1 x 2d
I already took a look at #111 but I still don't understand it.
Ok, so maybe it's easier to start from the "standard" convolution operation on images, which usually involves a Y x X x C
input, i.e. a Y x X
-dimensional image with C
channels. At the input you'll have e.g. 3 channels for color images, but for successive convolutions you'll deal with C
= number of kernels of the previous convolution. In any case, each kernel here will be connected to k_y x k_x x C
input elements.
In our case, each convolution is done with 2*d
kernels that are each connected to k x d
input elements, i.e. d
is the number of channels. Hence, each k x d
input window will produce one response from each kernel and you end up with the 1 x 2*d
output per window, or k x 2*d
for the whole sequence. The GLU activation then reduces this again to k x d
and we can apply the next convolution.
I'm confused. You said we use C
kernels for C
channels, but in this case we use 2*d
kernels for d
channels? Do we apply 2 different kernels to each channel then?
Edit: I just learned that there is a difference between number of features (2d
) and number of channels (d
, which get summed up after the dot product) and it's all clear now. Thanks!