fanglaosi/Point-In-Context

About the implementation

Closed this issue · 2 comments

Hi author, sorry for bothering again. I have some questions about the implementation.

  1. I found in the codes

    x, mask = self.MAE_encoder(pc_center, pc_neighborhood, target_center, target_neighborhood, self.pos_sincos) # x[B 2G C] mask[B 2G]
    . If the sine-cosine encoding sequence is used here? as you argue that "we find that the sine-cosine encoding sequence will significantly reduce the model performance compared to learned embedding, and even lead to the collapse of the training."

  2. I don't really understand the indexing setting in JS module. As you say "The key of our JS module is the consistency between center point indices of corresponding patches in both target and input point clouds. " with illustration in Fig. 2(b). I think that the input and target points are unordered, thus without one-to-one correspondence, For example, how can we ensure that input_center[1] is corresponding to target_center[1] as the points in each pointcloud could be in arbitrary order?

Best thanks for your time and look forward to your reply.

  1. During the exploration, we tried to supplement the model with location information about the input, and found that the model still failed to converge. Then after we added the JS module and found that the model could converge normally, we didn't pay too much attention to position embedding. So I guess the sin-cos positional embedding doesn't matter in our model and it is not our focus.
  2. When we generate input-output pairs in the training set, we ensure that the points in them are in one-to-one correspondence. We ensure that the shape of all point clouds is (N, C). For example, for the point cloud reconstruction task, we discard the points by setting them to zero rather than reducing the number of points. Also, please note that this does not affect testing. During testing, we randomly select customized input-output pairs from the training set that perform the same task as the query input, and guide the query input to perform the specified task.

Thank you so much.