Confuse about attention calculating process
Closed this issue · 3 comments
Hi. It is a nice work!
But I'm confused about the attention calculating process?
In the original Transformer, the attention matrix is calculated by q*k
. But in your paper and code, you use q-k
instead.
What's the reason for that? Are there any other papers also using this method to calculate attention matrix?
Hi,
Sorry for the late reply.
First of all, we are not the authors of Point Transformer.
As the official code is not released yet, so we reimplement the model based on the paper.
Secondly, using subtraction instead of dot-product was introduced in this paper which is another work from the first author of Point Transformer.
This kind of attention (a.k.a, vector-attention) is known to be more effective than scalar-attention but requires more resources.
I hope this answered your questions.
Regards,
Chunghyun.
Thanks
Is the vector-attention (subtraction) computationally heavier than the scaler attention(dot product)?