openai/finetune-transformer-lm

Using conv1d with kernel size 1

ollmer opened this issue · 2 comments

Hi!
I've noticed that the training code using 1d convolution with kernel size 1 in all invocations. Do we need convolution at all here? Why not replace it with the fully_connected layer?

If my understanding is correct, using a 1D convolution is the same as taking a dot product between the original matrix (say of dimensions n_timesteps x d) and a matrix of dimension d x n_filters.

Newmu commented

The codebase uses matmul when the receptive field size is 1. I originally thought conv1d would do this automatically "under the hood" but that does not appear to be the case.