Connections with transformers?

Question

Connections with transformers?

askerlee opened this issue 2 years ago · 0 comments

Just came across your paper, and found that the formulation of co-attention is quiote similar to transformers:

Especially, a few (but not all) major ingredients, i.e., Q, V projections, attention computed with softmax after dot-product, also appear in transformers.

Considering your work was earlier than the transformer paper, do you think that it may have inspired transformers? Thanks.