OV-3DET: Open-Vocabulary Point-Cloud Object Detection without 3D Annotation

OV-3DET: An Open Vocabulary 3D DETector.

OV-3DET: Open-Vocabulary Point-Cloud Object Detection without 3D Annotation,
Yuheng Lu, Chenfeng Xu, Xiaobao Wei, Xiaodong Xie, Masayoshi Tomizuka, Kurt Keutzer and Shanghang Zhang,
Accepted to CVPR2023

Features

Detects 3D objects according to text prompting.
The training of OV-3DET does not require 3D annotation.

Installation

See installation instructions.

Dataset preparation

See dataset instructions, or directly download the processed dataset.

Training OV-3DET

Phase 1

Learn to Localize 3D Objects from 2D Pretrained Detector:

bash scripts/scannet_train_loc.sh

Phase 2

Learn to Classify 3D Objects from 2D Pretrained vision-language Model:

bash scripts/scannet_train_dtcc.sh

Evaluate OV-3DET

To evaluate OV-3DET, simply by running:

bash scripts/evaluate.sh

Pretrained Models

We provide the pretrained model weights for both "Phase 1" and "Phase 2".

Dataset	Phase	Epochs	Model weights
ScanNet	1	400	weights
ScanNet	2	50	weights
SUN RGB-D	1	400	weights
SUN RGB-D	2	50	weights

Acknowledgement

This codebase is modified base on 3DETR [1], CLIP [2] and Detic [3], we sincerely appreciate their contributions!

[1] An end-to-end transformer model for 3d object detection. ICCV. 2021.
[2] Learning transferable visual models from natural language supervision. ICML. 2021.
[3] Detecting twenty-thousand classes using image-level supervision. ECCV. 2022.

Citation

If you find this repository helpful, please consider citing our work:

@article{lu2023open,
  title={Open-Vocabulary Point-Cloud Object Detection without 3D Annotation},
  author={Lu, Yuheng and Xu, Chenfeng and Wei, Xiaobao and Xie, Xiaodong and Tomizuka, Masayoshi and Keutzer, Kurt and Zhang, Shanghang},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

lyhdet/OV-3DET