3D-Box via Segment Anything

We extend Segment Anything to 3D perception by combining it with VoxelNeXt. Note that this project is still in progress. We are improving it and dveloping more examples. Any issue or pull request is welcome!

Why this project?

Segment Anything and its following projects focus on 2D images. In this project, we extend the scope to 3D world by combining Segment Anything and VoxelNeXt. When we provide a prompt (e.g., a point / box), the result is not only 2D segmentation mask, but also 3D boxes.

The core idea is that VoxelNeXt is a fully sparse 3D detector. It predicts 3D object upon each sparse voxel. We project 3D sparse voxels onto 2D images. And then 3D boxes can be generated for voxels in the SAM mask.

This project makes 3D object detection to be promptable.
VoxelNeXt is based on sparse voxels that are easy to be related to the mask generated from segment anything.
This project could facilitate 3D box labeling. 3D box can be obtained via a simple click on image. It might largely save human efforts, especially on autonuous driving scenes.

Installation

Basic requirements pip install -r requirements.txt
Segment anything pip install git+https://github.com/facebookresearch/segment-anything.git
spconv pip install spconv or cuda version spconv pip install spconv-cu111 based on your cuda version. Please use spconv 2.2 / 2.3 version, for example spconv==2.3.5

Getting Started

Please try it via seg_anything_and_3D.ipynb. We provide this example on nuScenes dataset. You can use other image-points pairs.

The point to image translation infos on nuScenes val can be download here.
The weight in the demo is voxelnext_nuscenes_kernel1.pth.
The nuScenes info file is nuscenes_infos_10sweeps_val.pkl. This is generated from OpenPCDet codebase.

TODO List

- Zero-shot version VoxelNeXt.
- Examples on more datasets.
- Indoor scenes.

Citation

If you find this project useful in your research, please consider citing:

@article{kirillov2023segany,
  title={Segment Anything}, 
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

@inproceedings{chen2023voxenext,
  title={VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking},
  author={Yukang Chen and Jianhui Liu and Xiangyu Zhang and Xiaojuan Qi and Jiaya Jia},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023}
}

Acknowledgement

Segment Anything
VoxelNeXt
UVTR for 3D to 2D translation.

Our Works in 3D Perception

VoxelNeXt (CVPR 2023) [Paper] [Code] Fully Sparse VoxelNet for 3D Object Detection and Tracking.
Focal Sparse Conv (CVPR 2022 Oral) [Paper] [Code] Dynamic sparse convolution for high performance.
Spatial Pruned Conv (NeurIPS 2022) [Paper] [Code] 50% FLOPs saving for efficient 3D object detection.
LargeKernel3D (CVPR 2023) [Paper] [Code] Large-kernel 3D sparse CNN backbone.
SphereFormer (CVPR 2023) [Paper] [Code] Spherical window 3D transformer backbone.
spconv-plus A library where we combine our works into spconv.
SparseTransformer A library that includes high-efficiency transformer implementations for sparse point cloud or voxel data.

dvlab-research/3D-Box-Segment-Anything