minghanqin/LangSplat

Querying the most relevant 3D Gaussians

Opened this issue · 4 comments

I couldn't find any code for inputting text and querying the most relevant 3D Gaussians in the code repository. Will it be provided later?

Thanks for your attention. The eval code has been released.

Thanks for your attention. The eval code has been released.

Thank you for your quick code update, but I found that the ground truth of lerf_ovs is on 2d, so how can we achieve 3D Object Localization? I still don’t know how to query the original 3D gaussian points. Thank you for your help.

Thank you for your attention to our work.

To achieve 3D text querying, there can be two approaches. The first method, as you mentioned, directly computes the similarity between 3D Gaussian points and text queries. The second method first renders 3D language Gaussian onto a 2D image plane using Gaussian Splatting, then computing similarity between text queries and language features on the 2D image pixels.

Previous SOTA works like LERF adopted the second method because NeRF's implicit modeling prevented the use of the first method. To ensure a fair comparison, we also employed the second method. However, our approach can indeed be tested using the first method, and we will explore it in the future to see if it yields better performance.

I hope this explanation addresses your questions.

Thanks for your kind help!