Task-Adaptive Retrieval over Agentic Multi-Modal Web Histories via Learned Graph Memory

Abstract

Retrieving relevant observations from long multi-modal web interaction histories is challenging because relevance depends on the evolving task state, modality (screenshots, HTML text, structured signals), and temporal distance. Prior approaches typically rely on static similarity thresholds or fixed-capacity buffers, which fail to adapt relevance to the current task context. We propose ACGM, a learned graph-memory retriever that constructs task-adaptive relevance graphs over agent histories using policy-gradient optimization from downstream task success. ACGM captures heterogeneous temporal dynamics with modality-specific decay (visual decays 4.3× faster than text: λv=0.47 vs.\ λx=0.11) and learns sparse connectivity (3.2 edges/node), enabling efficient O( T) retrieval. Across WebShop, VisualWebArena, and Mind2Web, ACGM improves retrieval quality to 82.7 nDCG@10 (+9.3 over GPT-4o, p<0.001) and 89.2\% Precision@10 (+7.7), outperforming 19 strong dense, re-ranking, multi-modal, and graph-based baselines. Code to reproduce our results is available atbluehttps://github.com/S-Forouzandeh/ACGM-Agentic-WebSaman Forouzandeh.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…