Mono-STAR: Mono-camera Scene-level Tracking and Reconstruction

Overview

We present Mono-STAR, the first real-time 3D reconstruction system that simultaneously supports semantic fusion, fast motion tracking, non-rigid object deformation, and topological change under a unified framework. The proposed system solves a new optimization problem that incorporates optical-flow-based 2D constraints to deal with fast motion, and a novel semantic-aware deformation graph (SAD-graph) for handling topology change. We test the proposed system under various challenging scenes and demonstrate that it significantly outperforms existing state-of-the-art systems.

Proposed

Overview of the proposed system. The system runs in two parallel threads, one for measurement and one for geometry. In each time-step $t$, the measurement thread loads a measurement $M_t$ from images or a camera buffer. Then, a segmentation network generates a set of semantic labels $L^m_{t}$. Once the measurement is loaded on the GPU memory, $M_t$ and previous alignment rendering $R^a_{t-1}$ are fed into an optical-flow network to generate the optical-flow $OF_t$ from previous geometry $S_{t-1}$ to measurement $M_t$. Optical-flow $OF_t$, geometry rendering $R_t$ and measurement $M_t$ are used to compute warp-field $W_t$ with non-rigid alignment. After the alignment, previous geometry $S_{t-1}$ will be warped to $S_{t-1}^{warp}$. The fusion render map $R^g_{t-1}$ is then rendered from $S_{t-1}^{warp}$. $R^g_{t-1}$, $S_{t-1}^{warp}$ are then combined with semantic labels $L^m_t$ from the measurement thread to generate the updated geometry $S_t$. In the end, a new alignment rendering $R^a_t$ is generated from updated geometry $S_t$ for the processing of the next frame.

Experiment Results

Fast Motion - BasketBall

Color	Graph	Semantic
geometry_color.mp4	geometry_graph.mp4	geometry_seg.mp4

Fast Motion - Falling Cup

Color	Graph	Semantic
geometry_color.mp4	geometry_graph.mp4	geometry_seg.mp4

Topology Change - Pick Cup

Color	Geometry	Semantic
geometry_color.mp4	geometry_normal.mp4	geometry_seg.mp4

Topology Change - Move Toy

Color	Geometry	Semantic
geometry_color.mp4	geometry_normal.mp4	geometry_seg.mp4

Topology Change - Push Coffee

Color	Geometry	Semantic
geometry_color.mp4	geometry_normal.mp4	geometry_seg.mp4

Deformation - Deform Pillow

Color	Geometry	Graph
geometry_color.mp4	geometry_normal.mp4	geometry_graph.mp4

Deformation - Deform Umbrella

Color	Geometry	Graph
geometry_color.mp4	geometry_normal.mp4	geometry_graph.mp4

Conclusion

We presented Mono-STAR, a single-view solution for the semantic-aware STAR problem. Mono-STAR uses a novel semantic-aware and adaptive deformation graph for simultaneous tracking and reconstruction, and can handle topology changes as well as semantic fusion. Experiments show that Mono-STAR achieves promising results in non-rigid object reconstruction, while resisting to semantic segmentation errors, and capturing fast motions on various challenging scenes. We believe that this system can inspire and boost more future research on imitation learning, dexterous manipulation, and many other relevant robotics problems.

Source Code

For source code, please check https://github.com/changhaonan/StarHub. Code is only checked under Ubuntu20.04.

Cite the work

@article{chang2023mono,
  title={Mono-STAR: Mono-camera Scene-level Tracking and Reconstruction},
  author={Chang, Haonan and Ramesh, Dhruv Metha and Geng, Shijie and Gan, Yuqiu and Boularias, Abdeslam},
  journal={arXiv preprint arXiv:2301.13244},
  year={2023}
}

changhaonan/Mono-STAR-demo