Awesome Object Pose Estimation [Paper List]
A repo to summarize resources used in object pose estimation as well as viewpoint estimation.
In the following tables, 3D CAD model is noted as model and 2D pictured object is noted as object.
Contributions are welcome. Please see the Table of Content which lists the things included in this repo. If you wish to contribute within these boundaries, feel free to send a PR. If you have suggestions for new sections to be included, please raise an issue and discuss before sending a PR.
- Resources 😎
- Objects in the controlled environments 🎥
- Objects in the wild 📷
- 3D model datasets 🚲
- Rendering methods 🚴
- Shape Encoding
This table lists the datasets commonly known as BOP: Benchmark 6D Object Pose Estimation, which provide accurate 3D object models and accurate 2D-3D alignment.
You can download all the BOP datasets here and use the toolkit provided by the organizers.
After downloading the data,
you can use our code data/BOP/ply2obj.py
to convert original .ply files to .obj files,
and run data/BOP/create_annotation.py
to create a single annotation file for all the scenes in a dataset.
Datasets format can be found here, we use instance id in our annotation to indicate different instances pictured in the same image.
Dataset | Sample image | Annotation | Statistics | Reference |
---|---|---|---|---|
HomebrewedDB | 6D pose + Depth + BoundingBox | 33 models in 13 videos with 17,420 frames | Preprint 2019 | |
YCB-Video | 6D Pose + Depth + Mask | 21 models in 92 videos with 133,827 frames | RSS 2018 | |
T-LESS | 6D Pose + Depth | 30 models in 20 videos with ~49K frames | WACV 2017 | |
Doumanoglou | 6D Pose + Depth | 2 models in 3 videos with 183 frames | CVPR 2016 | |
Tejani | 6D Pose + Depth | 6 models in 6 videos with 2,067 frames | ECCV 2014 | |
Occluded-LINEMOD | 6D Pose + Depth | 8 models in 1,214 frames with 8,992 objects | ECCV 2014 | |
LINEMOD | 6D pose + Depth for one object | 15 models in 15 videos with 18,273 frames | ACCV 2012 |
In this table, Pix3D provides accurate 2D-3D alignment while others provide a coarse alignment.
PASCAL3D+ is the de facto benchmark used for viewpoint estimation.
Dataset | Sample image | Annotation | Statistics | Reference |
---|---|---|---|---|
ApolloCar3D | 6D Pose + Mask | 34 car models with 60K+ objects in 5,277 images | CVPR 2019 | |
Pix3D | 6D Pose + Mask | 9 categories containing 395 models in 10,069 images | CVPR 2018 | |
ObjectNet3D | Euler Angles + BoundingBox | 100 categories with 201,888 objects in 90,127 images | ECCV 2016 | |
PASCAL3D+ | Euler Angles + BoundingBox | 12 categories with 36,292 objects in 30,889 images | WACV 2014 | |
KITTI | 3D BoundingBox | 80,256 objects in 14,999 images | CVPR 2012 |
In order to testify the network generalization ability (tested on images containing unseen 3D models from the training set), the following dataset could be used to generate synthetic training data.
Notice that ABC contains generic and arbitrary industrial CAD models while ShapeNetCore and ModelNet contain common category objects such as cars and chairs.
Dataset | Categories | Models in total | Reference |
---|---|---|---|
ABC | - | 1 million | CVPR 2019 |
ShapeNetCore | 55 | ~51,300 | ArXiv 2015 |
ModelNet-40 | 40 | 12,311 | CVPR 2015 |
-
Neural 3D Mesh Renderer: Kato el al. CVPR 2018
-
RenderNet: Thu et al. NIPS 2018
Rendering code in python can be found in blender-cli-rendering and pvnet-rendering
In this repo, we also provide script to render images from 3D models using python-blender that is easy to install and generate photo-realistic images.
In order to generate table-top synthetic data, we need to simulate a set of poses where the camera is uniformly distributed on the upper semi-sphere around the table plane.
blender_render/table_poses.npz
contains the poses obtained in LINEMOD-Occlusion dataset
with the distribution listed below:
* Range of object distances: 346 - 1500 mm (only 3 instances below 400 mm)
* Azimuth range: 0 - 360 deg
* Elevation range: -14 - 89 deg (only a few instances below 0 deg)
-
Download CAD models of the ABC dataset and retrieve .obj files into the target directory using
dowanload_ABC.sh
andretrieve_files.py
indata/ABC
. -
Then generate synthetic images of different models with various lightness and textures under random poses using
data/ABC/random_pose.py
PyBullet: a very popular one in the Robotics community.
-
Glumpy: does not support headless rendering (failed on ssh mode)
-
UnrealCV: extension of Unreal Engine 4, helps interact with virtual world and communicate with external program.
-
SyntheticComputerVision: resuming a lot of techniques used to generate synthetic image
Attention: 3D models should be aligned in the same way through meshlab to ensure the consistent orientation while wandering across the different datasets.
We provide python scripts to generate rendered images, downsampled point clouds and downsampled meshes
from .obj files in data
.
In order to generate point cloud, you need to compile O-CNN first and install open3d.