PAOLI: Pose-free Articulated Object Learning from Sparse-view Images
Abstract
We present a methodology to model articulated objects using a sparse set of images with unknown poses. Current methods require dense multi-view observations and ground-truth camera poses. Our approach operates with as few as four views per articulation and no camera supervision. Our central insight is to first solve a robust correspondence and alignment problem between unaligned reconstructions, before part motions can be analyzed. We first reconstruct each articulation independently using recent advances in sparse-view 3D reconstruction, then learn a deformation field that establishes dense correspondences across poses. A progressive disentanglement strategy further separates static from moving parts, enabling robust separation of camera and object motion. Finally, we optimize geometry, appearance, and kinematics jointly with a self-supervised loss that enforces cross-view and cross-pose consistency. Experiments on the standard benchmark and real-world examples demonstrate that our method produces accurate and detailed articulated object representations under significantly weaker input assumptions than existing approaches.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.