DiTTo: Scalable Order-aware All-in-One Image Restoration Agent

Abstract

Real-world images rarely suffer from a single degradation, and the order in which degradations are removed substantially affects the final restoration quality, motivating agent-based image restoration (IR), where a vision-language model schedules a pool of pre-built restoration-experts. However, existing training-based agents require O((ND)2) restoration-expert calls per image to construct the Optimal Restoration-action Trajectory Dataset (ORTD), where ND denotes the number of degradation types in the universe D, and couple agent training to a fixed restoration-expert pool, preventing extension to newly introduced restoration-experts without full retraining. To overcome these efficiency and extensibility bottlenecks, we propose DiTTo, a novel order-aware image restoration agent framework consisting of the DiTTo Simulator and the DiTTo Agent. The DiTTo Simulator combines -IR for single-step restoration-action simulation and AiO-IQA for per-action quality prediction, reducing ORTD construction to O(ND) simulator calls per image; the DiTTo Agent is trained by SFT on the simulator-generated ORTD, followed by Order-aware Restoration Alignment (ORA) that aligns degradation identification, restoration-action-ordering, and output format along independent axes. This enables plug-and-play scalable extensibility: adding a new restoration-expert requires updating only the lightweight ORA stage. On the MiO-100 evaluation set with up to five concurrent degradations, our DiTTo Agent achieves state-of-the-art multi-degradation restoration quality among previous agent-based IR methods.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…