Exploring Iterative Manifold Constraint for Zero-shot Image Editing

Abstract

Editability and fidelity are two essential demands for text-driven image editing, which expects that the editing area should align with the target prompt and the rest remain unchanged separately. The current cutting-edge editing methods usually obey an "inversion-then-editing" pipeline, where the input image is inverted to an approximate Gaussian noise zT, based on which a sampling process is conducted using the target prompt. Nevertheless, we argue that it is not a good choice to use a near-Gaussian noise as a pivot for further editing since it would bring plentiful fidelity errors. We verify this by a pilot analysis, discovering that intermediate-inverted latents can achieve a better trade-off between editability and fidelity than the fully-inverted zT. Based on this, we propose a novel zero-shot editing paradigm dubbed ZZEdit, which first locates a qualified intermediate-inverted latent marked as zp as a better editing pivot, which is sufficient-for-editing while structure-preserving. Then, a ZigZag process is designed to execute denoising and inversion alternately, which progressively inject target guidance to zp while preserving the structure information of p step. Afterwards, to achieve the same step number of inversion and denoising, we execute a pure sampling process under the target prompt. Essentially, our ZZEdit performs iterative manifold constraint between the manifold of Mp and Mp-1, leading to fewer fidelity errors. Extensive experiments highlight the effectiveness of ZZEdit in diverse image editing scenarios compared with the "inversion-then-editing" pipeline.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…