/Mask2Map

Primary LanguagePythonApache License 2.0Apache-2.0

[ECCV24] Mask2Map

Vectorized HD Map Construction Using Bird’s Eye View Segmentation Masks

Sewhan Choi1 *, Jungho Kim1 *, Hongjae Shin1, Junwon Choi2 **

1 Hanyang University, Korea 2 Seoul National University, Korea

(*) equal contribution, (**) corresponding author.

ECCV papers (ECCV2024)

ArXiv Preprint (arXiv 2208.05736)

overall

Updates

  • [2024-09] Release the model codes and trained weights of 24 and 110 epochs on the nuScenes dataset, respectively

Upcomings

  • [2024-10] Trained weights on the Argoverse2 dataset.
  • [2024-10] Trained weights for camera and LiDAR fusion on the nuScenes dataset.

Introduction

In this paper, we introduce Mask2Map, a novel end-to-end online HD map construction method designed for autonomous driving applications. Our approach focuses on predicting the class and ordered point set of map instances within a scene, represented in the bird's eye view (BEV). Mask2Map consists of two primary components: the Instance-Level Mask Prediction Network (IMPNet) and the Mask-Driven Map Prediction Network (MMPNet). IMPNet generates Mask-Aware Queries and BEV Segmentation Masks to capture comprehensive semantic information globally. Subsequently, MMPNet enhances these query features using local contextual information through two submodules: the Positional Query Generator (PQG) and the Geometric Feature Extractor (GFE). PQG extracts instance-level positional queries by embedding BEV positional information into Mask-Aware Queries, while GFE utilizes BEV Segmentation Masks to generate point-level geometric features. However, we observed limited performance in Mask2Map due to inter-network inconsistency stemming from different predictions to Ground Truth (GT) matching between IMPNet and MMPNet. To tackle this challenge, we propose the Inter-network Denoising Training method, which guides the model to denoise the output affected by both noisy GT queries and perturbed BEV Segmentation Masks.

Models

Results from the Mask2Map paper

Qualitative results on nuScenes val split

nuScenes dataset

Method Backbone BEVEncoder Lr Schd mAP config Download_phase1 Download_phase2
Mask2Map R50 bevpool 24ep 71.6 config model_phase1 model_phase2
Mask2Map R50 bevpool 110ep 75.4 config model_phase1 model_phase2

Notes:

  • All the experiments are performed on 4 NVIDIA GeForce RTX 3090 GPUs.

Getting Started

Demo

demo

Acknowledgements

Mask2Map is based on mmdetection3d. It is also greatly inspired by the following outstanding contributions to the open-source community: MapTR, BEVFusion, BEVFormer, HDMapNet, GKT, VectorMapNet.

Citation

If you find Mask2Map is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

@inproceedings{Mask2Map,
  title={Mask2Map: Vectorized HD Map Construction Using Bird’s Eye View Segmentation Masks},
  author={Choi, Sewhan and Kim, Jungho and Shin, Hongjae and Choi, Jun Won},
  booktitle={European Conference on Computer Vision},
  year={2024}
}