Positional embedding

Question

Positional embedding

linnlii opened this issue 4 years ago · 1 comments

I'm just wondering what is reasoning behind using a linear embedding method to encode the positions rather than the sine and cosine functions that are used in the transformer models for languages.

Answer 1 · 2021-04-14T15:50:05.000Z

Sorry for taking so long to answer. This is very common in modern transformer implementations, and was used in the original BERT paper (Devlin et al. 2018).