DisCo-FLoc: Semantic-Free Floorplan Localization via SE(2)-Aware Contrastive Disambiguation
Abstract
Visual Floorplan Localization (FLoc) struggles with severe structural aliasing caused by repetitive minimalist layouts. This occurs because physically distant poses share highly similar visual-geometric features, which degrades spatial separability and angular discriminability. While existing methods attempt to mitigate these ambiguities by relying on costly semantic annotations, the resulting performance gains remain inherently limited. To address the above issues, we propose DisCo-FLoc, a semantic-free method for visual-geometric Contrastive Disambiguation. First, we introduce a depth-aware Ray Regression Predictor (RRP) that serves as a dense-to-ray geometric projector. By explicitly suppressing visual clutter along the vertical dimension, RRP projects monocular RGB images into 2D ray primitives, which are matched with floorplans to produce geometry-aware FLoc candidates. Second, to resolve the remaining ambiguity among these candidates, we propose a spatially perturbed contrastive objective to align RGB images with local floorplan structures and formulate a visual-geometric compatibility function. In particular, we meticulously construct positive and negative samples at both positional and directional levels through SE(2) pose perturbations for contrastive learning, effectively achieving pose smoothness, spatial separability, and angular discriminability. The compatibility function enables DisCo-FLoc to disambiguate FLoc by using richer visual context beyond pure geometric layouts, without requiring any semantic annotations. Extensive experiments on two challenging visual FLoc benchmarks demonstrate that DisCo-FLoc significantly outperforms state-of-the-art semantic-based methods, especially narrowing the performance gap between positional and directional FLoc accuracy.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.