CMU 16833 Project: Semantic Scene Recognition

Place recognition in SLAM is an essential operation for loop closure detection. Common methodologies such as bag-of-words applied to previously seen keyframes may fail due to visual changes in the environment, such as different lighting or viewpoints. We propose Semantic Place Recognition: an approach that builds upon classical place recognition by incorporating semantic features to improve robustness. Specifically, our methods extract compressed features from 3D bounding box detections to discriminate between feasible and infeasible place recognition candidates proposed by a standard bag-of-words pipeline. These features can be obtained reliably from modern vision systems in robotics, such as in semantic SLAM or task-relevant object detection, which outputs a denser representation of a geometrically and semantically accurate environment. We evaluate our approach by performing place recognition classification tasks across 100 indoor scenes from the ARKitScenes dataset. Our approach of Semantic Place Recognition filters improved the output of standard BoW approaches with regard to Top-N metrics and precision-recall curves, achieving 78.4% at Top-1 recognition and 54.5% mean average precision compared to baseline results of 77.5% and 49.7% respectively.