Dual Quadrics from Object Detection BoundingBoxes as Landmark Representations in SLAM
Abstract
Research in Simultaneous Localization And Mapping (SLAM) is increasingly moving towards richer world representations involving objects and high level features that enable a semantic model of the world for robots, potentially leading to a more meaningful set of robot-world interactions. Many of these advances are grounded in state-of-the-art computer vision techniques primarily developed in the context of image-based benchmark datasets, leaving several challenges to be addressed in adapting them for use in robotics. In this paper, we derive a formulation for Simultaneous Localization And Mapping (SLAM) that uses dual quadrics as 3D landmark representations, and show how 2D bounding boxes (such as those typically obtained from visual object detection systems) can directly constrain the quadric parameters. Our paper demonstrates how to jointly estimate the robot pose and dual quadric parameters in factor graph based SLAM with a general perspective camera, and covers the use-cases of a robot moving with a monocular camera with and without the availability of additional depth information.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.