POSTECH-CVLab/point-transformer

Question about positional embedding

zwbai opened this issue · 1 comments

zwbai commented

Hi,

Thanks for sharing the repo. I am quite confused about the positional embedding:
p_r = layer(p_r.transpose(1, 2).contiguous()).transpose(1, 2).contiguous() if i == 1 else layer(p_r)

I don't know why the input of the first layer is the transpose(1,2) of p_r. Since the size of p_r should be (n, nsample, 3), why not directly feed it into the embedding block.

Thanks,

Hi @zwbai,

Sorry for the late reply.

don't know why the input of the first layer is the transpose(1,2) of p_r

First of all, if i == 1 means not the first layer but the second layer since i starts from 0.
Then, your question would be "why the input of the second layer (in this case, nn.BatchNorm1d) is the transpose(1, 2) of p_r?".
This is how nn.BatchNorm1d works. According to the documentation, the input shape should be (N, C, L) not the (N, L, C).

Hope this helps your understanding :).

Regards,