Colin97/OpenShape_code

Issues about channel of xyz in example.py

Holmes-GU opened this issue · 14 comments

Hi, thanks for your codes. I have tried to run python src/examples. However, it returns 'RuntimeError: Given groups=1, weight of size [64, 9, 1, 1], expected input[1, 10, 64, 384] to have 9 channels, but got 10 channels instead', deriving from self.mlp of function 'PointNetSetAbstraction' in modes/pointnet_util.py. The source maybe lie in the channel of xyz (4 in example.py). Is there any solutions?

Thank you very much.

Hi. I am a little confused why you are seeing 4 channels for xyz.
image
Mine has 3.

Hi. I am a little confused why you are seeing 4 channels for xyz.
image
Mine has 3.

Hi, I follow 'python3 src/example.py' for quick start. In example.py, the xyz is processed by ME.utils.batched_coordinates(), therefore, the channel becomes 4, as shown below. Besides, 'demo/pc.ply' does not exist and it should be 'demo/owl.ply' instead. And in the training file, this function seems not to be used.
image
image

Hi, which checkpoint are you using? The example.py is for spconv. If you are using a pointbert checkpoint, some modifications are needed. Sorry for the confusion.

Hi, which checkpoint are you using? The example.py is for spconv. If you are using a pointbert checkpoint, some modifications are needed. Sorry for the confusion.

I have changed the backbone to pointbert with model.scaling=4, model.name=PointBert and model.use_dense=True. Would you like to provide me with exact modifications? Thanks.
image

Hi, please refer to the codes:

https://github.com/Colin97/OpenShape_code/blob/26cf8d16551368f8f1e8e3801cbfb629b6157a03/src/train.py#L101C18-L101C18

https://github.com/Colin97/OpenShape_code/blob/26cf8d16551368f8f1e8e3801cbfb629b6157a03/src/data.py#L238C10-L238C19

Basically, to use PointBert, you don't need to process the PC with MinkowskiEngine.

Ok. Thanks for your instructions. I will try it tomorrow~

Hi. Following your instructions, I successfully run the codes.

Sorry for bothering you again. I observe some differences in processing data in training and testing as follows:

  1. In training, the rgb is multiplied by a factor 0.4 in 'if use_color' from 'class Four'. Why not apply it in testing?
  2. In class ModelNet40Test, 'rgb = rgb / 255.0' is applied before 'if use_color' but not applied and observed in 'class ObjaverseLVIS' and 'class ScanObjectNNTest'.
  1. Do you mean this line? This is an augmentation, which randomly change the colors of some shapes to a constant (0.4).

  2. This is due to some inconsistency when preparing the data files. RGB in ObjaverseLVIS and ScanObjectNN are in [0,1]. ModelNet40 don't have colors and we put 100 for all data files. 'rgb = rgb / 255.0' is just to normalize them to 0.4.

OK. What about this part? Does ScanObjectNNTest not own colors and directly set it as a constant (0.4)?
image

Yes.

Yes.

OK, thank you very much.

哈喽,再问一下哦,这个地方为啥只取x[:,0],而不是x呢?
image

哈喽,再问一下哦,这个地方为啥只取x[:,0],而不是x呢?
这个x[:,0]是class token吧,然后剩下的384是聚合以后点的个数吧?
image

Because we pick the first token, which is the CLS token, as the pooler output, as in any transformer encoder architecture (including but not limited to BERT, CLIP ViT, PointBERT).