Question about experiments in paper

Question

Question about experiments in paper

Closed this issue a month ago · 4 comments

Hello, I'm interested in your paper. it seems novel approach to solve the problem.

in figure 6, some 3D GS methods like LangSplat also unprojects to 3D space. I wonder how did you do this experiment since I thought it's impossible to mask specific gaussian primitives in 3D space.

also, when using SAM, did you also mask object using segment, crop and encode using CLIP image encoder like LangSplat?

Thank you.

Answer 1 · 2024-08-22T06:32:05.000Z

Thank you for spotting some possible confusion.

To calculate the similarity between the text query, we decoded the 3-dim language features embedded in each Gaussian to 512-dim.
Yes, we masked the object and then cropped it.

We hope these answer your questions.

Answer 2 · 2024-08-22T11:45:00.000Z

@bbangsik13 Thanks for your fast reply. in question 1, I have another question related to. in langsplat's decoder, https://github.com/minghanqin/LangSplat/blob/main/autoencoder/model.py
input is not 512 vector or 3d points, but (h,w,3) size which is same as 2d setting. but how did you decode whole 3D GS points' compressed feature which dimension has 3 ?

Answer 3 · 2024-08-22T12:47:00.000Z

Hi,

LangSplat reshape the [Level,H,W,3] feature to [Level*H*W,3] and then decode it. Likewise, we reshape the [Level,Num_gaussian,3] feature embedded in each Gaussian to [Level*Num_gaussian,3] and then decode it for 3D segmentation. Also, please note that LEGaussian's decoder is implemented with 1x1 convolution.

Answer 4 · 2024-08-22T12:54:11.000Z

@bbangsik13 now I understand about it! thanks :)