OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding

🏠 Abstract

In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval. However, existing methods face some limitations: they either focus on learning point-wise features, resulting in blurry semantic understanding, or solely tackle object-level reconstruction, thereby overlooking the intricate details of the object's interior. To address these challenges, we introduce OpenObj, an innovative approach to build open-vocabulary object-level Neural Radiance Fields (NeRF) with fine-grained understanding. In essence, OpenObj establishes a robust framework for efficient and watertight scene modeling and comprehension at the object-level. Moreover, we incorporate part-level features into the neural fields, enabling a nuanced representation of object interiors. This approach captures object-level instances while maintaining a fine-grained understanding. The results on multiple datasets demonstrate that OpenObj achieves superior performance in zero-shot semantic segmentation and retrieval tasks. Additionally, OpenObj supports real-world robotics tasks at multiple scales, including global movement and local manipulation. connectivity to construct a hierarchical graph. Validation results from public dataset SemanticKITTI demonstrate that, OpenGraph achieves the highest segmentation and query accuracy.

🛠 Install

Install the required libraries

Use conda to install the required environment. To avoid problems, it is recommended to follow the instructions below to set up the environment.

conda env create -f environment.yml

Install CropFormer Model

Follow the instructions to install the CropFormer model and download the pretrained weights CropFormer_hornet_3x.

Install TAP Model

Follow the instructions to install the TAP model and download the pretrained weights here.

Install SBERT Model

pip install -U sentence-transformers

Download pretrained weights

git clone https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

Clone this repo

git clone https://github.com/BIT-DYN/OpenGraph
cd OpenGraph

📊 Prepare dataset

OpenGraph has completed validation on Replica (as same with vMap) and Scannet. Please download the following datasets.

Replica Demo - Replica Room 0 only for faster experimentation.
Replica - All Pre-generated Replica sequences.
ScanNet - Official ScanNet sequences.

Object Segmentation and Understanding

Run the following command to identifie and comprehend object instances from color images.

cd maskclustering
python3 mask_gen.py  --input /data/dyn/object/vmap/room_0/imap/00/rgb/*.png --input_depth /data/dyn/object/vmap/room_0/imap/00/depth/*.png --output results/room_0/mask/ --opts MODEL.WEIGHTS CropFormer_hornet_3x_03823a.pth

You can see a visualization of the results in the results/vis folder.

Mask Clustering

Run the following command to ensure consistent object association across frames.

python3 mask_graph.py --config_file ./configs/room_0.yaml --input_mask results/room_0/mask/mask_init_all.pkl --input_depth /data/dyn/object/vmap/room_0/imap/00/depth/*.png --input_pose  /data/dyn/object/vmap/room_0/imap/00/traj_w_c.txt --output_graph results/room_0/mask/graph/ --input_rgb /data/dyn/object/vmap/room_0/imap/00/rgb/*.png --output_dir /data/dyn/object/vmap/room_0/imap/00/ --input_semantic /data/dyn/object/vmap/room_0/imap/00/semantic_class/*.png

You can see a visualization of the results in the results/graph folder. And this will generate some folders (class_our/ instance_our/) and documents (object_clipfeat.pkl object_capfeat.pkl object_caption.pkl) in the data directory, which are necessary for the follow-up process.

Part-level Fine-Grained Feature Extraction

Run the following command to distinguish parts and extracts their visual features.

cd ../partlevel
python sam_clip_dir.py --input_image /data/dyn/object/vmap/room_0/imap/00/rgb/*.png --output_dir /data/dyn/object/vmap/room_0/imap/00/partlevel --down_sample 5

This will generate a folder (partlevel/) in the data directory, which is necessary for the follow-up process.

NeRF Rendering and Training

Run the following command to vectorize the training of NeRFs for all objects.

cd ../nerf
python train.py --config ./configs/Replica/room_0.json --logdir results/room_0

This will generate a folder (ckpt/) in the result directory containing the network parameters for all objects.

visulization

Run the following command to generate the vis documents.

cd ../nerf
python gen_map_vis.py --scene_name room_0 --dataset_name Replica

Interactions can be made using our visualization files.

cd ../nerf
python vis_interaction.py --scene_name room_0 --dataset_name Replica --is_partcolor

Then in the open3d visualizer window, you can use the following key callbacks to change the visualization.

Press C to toggle the ceiling.

Press S to color the meshes by the object class.

Press R to color the meshes by RGB.

Press I to color the meshes by object instance ID.

Press O to color the meshes by part-level feature.

Press F and type object text and num in the terminal, and the meshes will be colored by the similarity.

Press P and type object text and num and part text in the terminal, and the meshes will be colored by the similarity.

🔗 Citation

If you find our work helpful, please cite:

@article{openobj,
  title={OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding},
  author={Deng, Yinan and Wang, Jiahui and Zhao, Jingyu and Dou, Jianyu and Yang, Yi and Yue, Yufeng},
  journal={arXiv preprint arXiv:2406.08009},
  year={2024}
}

👏 Acknowledgements

We would like to express our gratitude to the open-source projects and their contributors vMap. Their valuable work has greatly contributed to the development of our codebase.

BIT-DYN/OpenObj