Positional embedding
linnlii opened this issue · 1 comments
linnlii commented
I'm just wondering what is reasoning behind using a linear embedding method to encode the positions rather than the sine and cosine functions that are used in the transformer models for languages.
rmrao commented
Sorry for taking so long to answer. This is very common in modern transformer implementations, and was used in the original BERT paper (Devlin et al. 2018).