High-quality Instance-aware Semantic 3D Map Using RGB-D Camera
Abstract
We present a mapping system capable of constructing detailed instance-level semantic models of room-sized indoor environments by means of an RGB-D camera. In this work, we integrate deep-learning-based instance segmentation and classification into a state of the art RGB-D SLAM system. We leverage the pipeline of ElasticFusion [1] as a backbone and propose modifications of the registration cost function. The proposed objective function features a tunable weight for the appearance channel, which can be learned from data. The resulting system is capable of producing accurate semantic maps of room-sized environments, as well as reconstructing highly detailed object-level models. The developed method has been verified through experimental validation on the TUMRGB-D SLAM benchmark and the YCB video dataset. Our results confirmed that the proposed system performs favorably in terms of trajectory estimation, surface reconstruction, and segmentation quality in comparison to other state-of-the-art systems.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.