songlab-cal/tape

Positional embedding

linnlii opened this issue · 1 comments

I'm just wondering what is reasoning behind using a linear embedding method to encode the positions rather than the sine and cosine functions that are used in the transformer models for languages.

rmrao commented

Sorry for taking so long to answer. This is very common in modern transformer implementations, and was used in the original BERT paper (Devlin et al. 2018).