FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing
Abstract
Text-guided image editing with diffusion models has achieved remarkable quality but often suffers from prohibitive latency. We introduce FlashEdit, a real-time localized image editing framework for the standard inversion-based editing setting. Its efficiency and precision stem from three key innovations: (1) a Cycle-Consistent One-Step Inversion (COSI) pipeline that encourages manifold-aligned one-step inversion through cycle consistency; (2) a Background Shield (BG-Shield) technique that improves preservation of non-edited regions via structural self-attention intervention; and (3) a Sparsified Spatial Cross-Attention (SSCA) mechanism that promotes precise edits by suppressing semantic leakage. Experiments on PIE-Bench demonstrate a strong preservation-efficiency trade-off, with edits completed in under 0.2 seconds and an over 150× speedup over DDIM-based multi-step editing. Our code will be made publicly available at https://github.com/JunyiWuCode/FlashEdit.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.