Confuse about attention calculating process

Question

Confuse about attention calculating process

Closed this issue 3 years ago · 3 comments

Hi. It is a nice work!
But I'm confused about the attention calculating process?
In the original Transformer, the attention matrix is calculated by q*k. But in your paper and code, you use q-k instead.
What's the reason for that? Are there any other papers also using this method to calculate attention matrix?

zhangxiann commented 3 years ago

Thanks

Answer 1 · 2021-10-25T19:24:49.000Z

Hi,

Sorry for the late reply.

First of all, we are not the authors of Point Transformer.
As the official code is not released yet, so we reimplement the model based on the paper.

Secondly, using subtraction instead of dot-product was introduced in this paper which is another work from the first author of Point Transformer.
This kind of attention (a.k.a, vector-attention) is known to be more effective than scalar-attention but requires more resources.

I hope this answered your questions.

Regards,

Chunghyun.

Answer 2 · 2024-09-26T02:49:47.000Z

Is the vector-attention (subtraction) computationally heavier than the scaler attention(dot product)?