HumanGenesis: Agent-Based Geometric and Generative Modeling for Synthetic Human Dynamics
Abstract
Synthetic human dynamics aims to generate photorealistic videos of human subjects performing expressive, intention-driven motions. However, current approaches face two core challenges: (1) geometric inconsistency and coarse reconstruction, due to limited 3D modeling and detail preservation; and (2) motion generalization limitations and scene inharmonization, stemming from weak generative capabilities. To address these, we present HumanGenesis, a framework that integrates geometric and generative modeling through four collaborative agents: (1) Reconstructor builds 3D-consistent human-scene representations from monocular video using 3D Gaussian Splatting and deformation decomposition. (2) Critique Agent enhances reconstruction fidelity by identifying and refining poor regions via multi-round MLLM-based reflection. (3) Pose Guider enables motion generalization by generating expressive pose sequences using time-aware parametric encoders. (4) Video Harmonizer synthesizes photorealistic, coherent video via a hybrid rendering pipeline with diffusion, refining the Reconstructor through a Back-to-4D feedback loop. HumanGenesis achieves state-of-the-art performance on tasks including text-guided synthesis, video reenactment, and novel-pose generalization, significantly improving expressiveness, geometric fidelity, and scene integration.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.