RECON: Reasoning with Condensation for Efficient Retrieval-Augmented Generation
Abstract
Search agents trained with reinforcement learning (RL) interleave reasoning with tool calls in a multi-turn, tool-integrated reasoning (TIR) loop, where each tool invocation returns an environment observation that is appended to the agent's context. As the rollout proceeds, these raw observations accumulate, inflating token cost and diluting the signal available for downstream reasoning. Unlike single-pass retrieve-then-read pipelines, where context compression is a one-time postprocessing step, the multi-turn RL setting requires compression that runs at every observation step while remaining decoupled from policy optimization. We introduce RECON (REasoning with CONdensation), a framework that addresses this challenge by inserting a dedicated observation compressor into the reasoning loop. The compressor is trained via a two-stage curriculum: relevance pretraining on QA datasets followed by multi-aspect distillation from proprietary LLMs, and remains frozen during RL training to preserve policy stability. Integrated into the Search-R1 search-agent pipeline, RECON reduces total context length by 35%, improves training speed by 5.4% and inference latency by 30.9%, while boosting average exact-match by 14.5% on the 3B agent and 3.0% on the 7B agent, with particular strength in multi-hop QA. These results establish learned observation compression as a key component for building practical, scalable RL-trained search agents.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.