Query6DoF: Learning Sparse Queries as Implicit Shape Prior for Category-Level 6DoF Pose Estimation

This is the PyTorch implemention of ICCV'23 paper Query6DoF: Learning Sparse Queries as Implicit Shape Prior for Category-Level 6DoF Pose Estimation

Abstract

Category-level 6DoF object pose estimation intends to estimate the rotation, translation, and size of unseen objects. Many previous works use point clouds as a pre-learned shape prior to overcome intra-category variability. The shape prior is deformed to reconstruct instances' point clouds in canonical space and to build dense 3D-3D correspondences between the observed and reconstructed point clouds. However, in these methods, the pre-learned shape prior is not jointly optimized with estimation networks, and they are trained with a surrogate objective. In this paper, we propose a novel 6D pose estimation network based on a series of category-specific sparse queries that serve as the representation of the shape prior. Each query represents a shape component, and these queries are learnable embeddings that can be optimized together with the estimation network according to the point cloud reconstruction loss, the normalized object coordinate loss, and the 6d pose estimation loss. Our proposed network adopts a deformation-and-matching paradigm with attention, where the queries dynamically extract features from regions of interest using the attention mechanism and then directly regress results. Furthermore, our method reduces computation overhead through the sparseness of the queries and the incorporation of a lightweight global information injection block. With the aforementioned design, our method achieves state-of-the-art (SOTA) pose estimation performance on the NOCS dataset.

Requirements

Linux (tested on Ubuntu 16.04)
Python 3.8
CUDA 11.1
PyTorch 1.10.2

Installation

conda create -n query6dof python=3.8

conda activate query6dof

pip install torch==1.10.2+cu111 -f  https://download.pytorch.org/whl/cu111/torch_stable.html

pip install opencv-python mmengine numpy tqdm

cd Pointnet2/pointnet2

python setup.py install

Dataset

Download camera_train, camera_eval, real_test, real_train, ground-truth annotations and mesh models provided by NOCS. Then process these files following SPD. And download segmentation results from Mask R-CNN, and predictions of NOCS from SPD. The dataset is organized as follows:


── data
    ├── CAMERA
    ├── gts
    ├── obj_models
    ├── Real
    └── results
            └── mrcnn_results   
── results
      └── nocs_results

Evaluation

Please download our pretrain model here or pretrain model without linear shape augmentation and non-linear shape augmentation here and put it in 'runs/CAMERA+Real/run/model' dictionory.

Then, you can make an evaluation for REAL275 using following command.

python tools/valid.py --cfg config/run_eval_real.py --gpus 0

Then, you can make an evaluation for CAMERA25 using following command.

python tools/valid.py --cfg config/run_eval_camera.py --gpus 0

You can get running speed at the same time.

Train

'tools/train.py' is the main file for training. You can train using the following command.

python tools/train.py --cfg config/run.py --gpus 0,1,2,3

This config is for training on 4 gpus with the batch size = 15 on a single gpu, and the total batch size = 60.

Acknowledgment

The dataset is provided by NOCS. Our code is developed based on Pointnet2.PyTorch and SPD

hustvl/Query6DoF