Queries about permutation invariance of Transformer
ruili3 opened this issue · 5 comments
Hi Menghao,
Thank you for sharing the interesting paper!
I have some queries about the definition of the permutation invariance. In my opinion, though the self-attention calculates the global contextual information and aggregate the feature via weighted summation, the resulting features are still related to the point order (but the feature of the same point seems to be invariant). And I also observe the implementation of max-pooling strategy, which is proposed in PointNet to guarantee invariance.
So I wonder how you define the permutation invariance, because it seems the attention itself can hardly gurantee the global invanriance. Thank you very much!
Rui
One paper also mentioned the permutation invariance [1], but another one propose the same issue as mine [2]. So It will be great if you can offer a further explanation :D
[1] Zhao H, Jiang L, Jia J, et al. Point Transformer[J]. arXiv preprint arXiv:2012.09164, 2020.
[2] Engel N, Belagiannis V, Dietmayer K. Point Transformer[J]. arXiv preprint arXiv:2011.00931, 2020.
Hi rui,
profound understanding.
The following response only represents my personal opinion.
Point cloud is a kind of point set. From the perspective of set theory, the permutation invariant operations can be understood as you can get same features if you enter the same points which is independent of input order. Obviously, attention operation is one of the permutation invariance operations.
Best,
Meng-Hao
Thank you for your reply! I agree with you that after the MLP and self-attention, the point-wise feature value is invariant regardless of the point order, but I think the permutation of the output point-wise features is related to the point order. Only when conducting summation or max operations, the aggregated global feature is totally invariant. It this kind of understanding consisitent with your illustration?
Yes, you are right.
Thanks a lot for your explanation :D