Connections with transformers?
askerlee opened this issue · 0 comments
askerlee commented
Just came across your paper, and found that the formulation of co-attention is quiote similar to transformers:
Especially, a few (but not all) major ingredients, i.e., Q, V projections, attention computed with softmax after dot-product, also appear in transformers.
Considering your work was earlier than the transformer paper, do you think that it may have inspired transformers? Thanks.