MetricHMSR:Metric Human Mesh and Scene Recovery from Monocular Images

Abstract

We introduce MetricHMSR, a novel framework for recovering metric human meshes and 3D scenes from a single monocular image. Existing methods struggle to recover metric scale due to monocular scale ambiguity and weak-perspective camera assumptions. Moreover, their fully coupled feature representations make it difficult to disentangle local pose from global translation, often requiring multi-stage pipelines that introduce accumulated errors. To address these challenges, we propose MetricHMR (Metric Human Mesh Recovery), which incorporates a bounding camera ray map representation to provide explicit metric cues for human reconstruction,together with a Human Mixture-of-Experts (HumanMoE) that dynamically routes image features to specialized experts, enabling the disentangled perception of local human pose and global metric position. Leveraging the recovered metric human as a geometric anchor, we further refine monocular metric depth estimation to achieve more accurate 3D alignment between humans and scenes.Comprehensive experiments demonstrate that our method achieves state-of-the-art performance on both human mesh recovery and metric human-scene reconstruction. Project Page: https://Metaverse-AI-Lab-THU.github.io/MetricHMSR.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…