/GeoMIM

[ICCV 2023] GeoMIM: towards better 3d knowledge transfer via masked image modeling for multi-view 3d understanding

Primary LanguagePython

GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding (ICCV 2023)

pipeline

Introduction

Welcome to the official repository of GeoMIM, a groundbreaking pretraining approach for multi-view camera-based 3D perception. This repository provides the pretraining and finetuning code and pretrained models to reproduce the exceptional results presented in our paper.

The implementation of pretraining is based on bevfusion. See the pretrain folder for further details.

After pretraining, we finetune the pretrained Swin Transformer for multi-view camera-based 3D perception. We use the BEVDet for finetuning. We provide models with different techniques used in BEVDet, including CBGS, 4D, Depth, and Stereo. We also provide models for occpancy prediction using the implementation in BEVDet repo. See the bevdet folder for further details.

Key Results

We provide the GeoMIM pretrained Swin-Base and Large checkpoints.

Model Download
Swin-Base Model
Swin-Large Model

We have achieved strong performance on the nuScenes benchmark with GeoMIM. Here are some quantitative results on 3D detection:

Config mAP NDS Download
bevdet-swinb-4d-256x704-cbgs 33.98 47.19 Model
bevdet-swinb-4d-256x704-cbgs-geomim 42.25 53.1 Model
bevdet-swinb-4d-stereo-256x704-cbgs-geomim 45.33 55.1 Model
bevdet-swinb-4d-stereo-512x1408-cbgs 47.2 57.6 Model (#)
bevdet-swinb-4d-stereo-512x1408-cbgs-geomim 52.04 60.92 Model

Here are some quantitative results on occpancy prediction:

Config mIoU Download
bevdet-occ-swinb-4d-stereo-2x (*) 42.0 Model (#)
bevdet-occ-swinb-4d-stereo-2x-geomim 45.0 Model
bevdet-occ-swinb-4d-stereo-2x-geomim (*) 45.73 Model
bevdet-occ-swinl-4d-stereo-2x-geomim 46.27 Model

(*) Load 3D detection checkpoint. (#) Original BEVDet checkpoint.

Get Start

Citation

If you find GeoMIM beneficial for your research, kindly consider citing our paper:

@inproceedings{liu2023geomim,
  title={GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding},
  author={Jihao Liu, Tai Wang, Boxiao Liu, Qihang Zhang, Yu Liu, Hongsheng Li},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2023}
}

Contact

For any questions or inquiries, please feel free to reach out to the authors: Jihao Liu (email) and Tai Wang (email)